Download as pdf or txt
Download as pdf or txt
You are on page 1of 780

FIFTH

EDITION

PSYCHOLOGICAL
ASSESSMENT
FIFTH EDITION

PSYCHOLOGICAL
ASSESSMENT

EDITED BY CHERYL FOXCROFT AND GERT ROODT


Oxford University Press is a department of the University of Oxford. It furthers the University’s
objective of excellence in research, scholarship, and education by publishing worldwide. Oxford is a
registered trade mark of Oxford University Press in the UK and in certain other countries.
Published in South Africa by
Oxford University Press Southern Africa (Pty) Limited
Vasco Boulevard, Goodwood, N1 City, Cape Town, South Africa, 7460 P O Box 12119, N1
City, Cape Town, South Africa, 7463
© Oxford University Press Southern Africa (Pty) Ltd 2018

The moral rights of the author have been asserted.

Fifth edition published 2018


All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, without the prior permission in writing of Oxford University
Press Southern Africa (Pty) Ltd, or as expressly permitted by law, by licence, or under terms agreed
with the appropriate reprographic rights organisation, DALRO, The Dramatic, Artistic and Literary
Rights Organisation at dalro@dalro.co.za. Enquiries concerning reproduction outside the scope of
the above should be sent to the Rights Department, Oxford University Press Southern Africa (Pty)
Ltd, at the above address.
You must not circulate this work in any other form and you must impose this same condition
on any acquirer.
Introduction to psychological assessment in the South African context 5e
Print ISBN 978-0-190418-59-5
ePUB ISBN 978-0-190416-54-6
First impression 2018
Typeset in Helvetica Neue LT Std 10pt on 12pt
Acknowledgements
Publishing manager: Alida Terblanche
Publisher: Marisa Montemarano
Project manager: Gugulethu Baloyi
Editor: Deidre Donnelly
Designer: Jade Benjamin
Cover design: Judith Cross
Typesetter: Aptara
The authors and publisher gratefully acknowledge permission to reproduce copyright
material in this book. Every effort has been made to trace copyright holders, but if any
copyright infringements have been made, the publisher would be grateful for information
that would enable any omissions or errors to be corrected in subsequent impressions.
Links to third party websites are provided by Oxford in good faith and for
information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
Photograph acknowledgements
Page 179: kaetana; Shutterstock
TABLE OF CONTENTS
List of contributors
Introduction

Part 1 Foundation Zone

1 An overview of assessment: Definition and


scope 1.1 Introduction
1.2 About tests, testing, and assessment
1.3 Assessment measures and tests
1.3.1 Characteristics of assessment measures
1.3.2 Important issues
1.4 The assessment process
1.5 Psychological assessment in Africa
1.6 Conclusion

2 Psychological assessment: A brief retrospective


overview 2.1 Introduction
2.2 A brief overview of the early origins of psychological
assessment 2.2.1 Astrology
2.2.2 Physiognomy
2.2.3 Humorology
2.2.4 Phrenology
2.2.5 Chirology – Palmistry
2.2.6 Graphology
2.2.7 Summary
2.3 The development of modern psychological
assessment: An international perspective
2.3.1 Early developments
2.3.2 The early twentieth century
2.3.3 Measurement challenges
2.3.4 The influence of technology
2.3.5 The influence of multiculturalism
2.3.6 Standards, training, and test users’ roles
2.4 The development of modern psychological
assessment: A South African perspective
2.4.1 The early years
2.4.2 The early use of assessment measures in industry
2.4.3 The development of psychological
assessment from the 1960s onwards
2.4.4 Psychological assessment in democratic South Africa
2.4.4.1 Assessment in education
2.4.4.2 The Employment Equity Act
2.4.4.3 Professional practice guidelines
2.5 Can assessment measures and the process of
assessment still fulfil a useful function in modern society?
2.6 Conclusion

3 Basic measurement and statistical


concepts 3.1 Introduction
3.2 Levels of measurement
3.2.1 Properties of measurement scales
3.2.1.1 Magnitude
3.2.1.2 Equal intervals
3.2.1.3 Absolute zero
3.2.2 Categories of measurement levels
3.3 Measurement errors
3.3.1 Random error
3.3.2 Systematic error
3.4 Measurement scales
3.4.1 Different scaling options
3.4.1.1 Category scales
3.4.1.2 Likert-type scales
3.4.1.3 Semantic differential scales
3.4.1.4 Intensity scales
3.4.1.5 Constant-sum scales
3.4.1.6 Paired-comparison scales
3.4.1.7 Graphic rating scales
3.4.1.8 Forced-choice scales
3.4.1.9 Ipsative scales (and scoring)
3.4.1.10 Guttman scales
3.4.2 Considerations in deciding on a scale format
3.4.2.1 Single dimensional versus composite scale
3.4.2.2 Question-versus statement-item formats
3.4.2.3 Type of response labels versus unlabelled response categories
3.4.2.4 Single attribute versus comparative rating formats
3.4.2.5 Even-numbered versus odd-numbered rating options
3.4.2.6 Ipsative versus normative scale options
3.5 Basic statistical concepts
3.5.1 Displaying data
3.5.2 Measures of central tendency
3.5.3 Measures of variability
3.5.4 Measures of association
3.6 Norms
3.6.1 The standard normal distribution
3.6.2 Establishing norm groups
3.6.3 Co-norming of measures
3.6.4 Types of test norms
3.6.4.1 Developmental scales
3.6.4.2 Percentiles
3.6.4.3 Standard scores
3.6.4.4 Deviation IQ scale
3.6.5 Interrelationships of norm scores
3.6.6 Setting standards and cut-off scores
3.6.7 Issues with norm use
3.7 Conclusion

4 Reliability: Basic concepts and


measures 4.1 Introduction
4.2 Reliability
4.2.1 Defining reliability
4.2.2 Types of reliability
4.2.2.1 Test-retest reliability
4.2.2.2 Alternate-form reliability
4.2.2.3 Split-half reliability
4.2.2.4 Inter-item consistency
4.2.2.5 Inter-scorer reliability
4.2.2.6 Intra-scorer reliability
4.2.2.7 Contemporary approaches to estimating reliability
4.2.3 Factors affecting reliability
4.2.3.1 Respondent error
4.2.3.2 Administrative error
4.2.4 Interpretation of reliability
4.2.4.1 The magnitude of the reliability coefficient
4.2.4.2 Standard Measurement Error
4.2.4.3 Reliability and mastery assessment
4.3 Conclusion

5 Validity: Basic concepts and


measures 5.1 Introduction
5.2 Validity
5.2.1 Defining validity
5.2.2 Types of validity
5.2.2.1 Content-description procedures
5.2.2.2 Construct-identification procedures
5.2.2.3 Criterion-prediction procedures
5.2.2.4 Possible criterion measures
5.2.3 Unitary validity
5.2.4 Indices and interpretation of validity
5.2.4.1 Validity coefficient
5.2.4.2 Factors affecting the validity coefficient
5.2.4.3 Coefficient of determination
5.2.4.4 Standard error of estimation
5.2.4.5 Predicting the criterion: Regression analysis
5.3 Conclusion

6 Developing a psychological
measure 6.1 Introduction
6.2 Steps in developing a measure
6.2.1 The planning phase
6.2.1.1 Establish a test-development team
6.2.1.2 Specify the aim, target population, and nature of the measure
6.2.1.3 Define the content of the measure
6.2.1.4 Other test specifications to include in the
plan 6.2.2 Item development, creation or sourcing
6.2.2.1 Develop the items
6.2.2.2 Review the items
6.2.3 Assemble and pre-test the experimental version of the
measure 6.2.3.1 Arrange the items
6.2.3.2 Finalise the length
6.2.3.3 Answer mechanisms and protocols
6.2.3.4 Develop administration instructions
6.2.3.5 Pre-test the experimental version of the
measure 6.2.4 The item-analysis phase
6.2.4.1 Classical Test-Theory item analysis: Determine item difficulty (p)
6.2.4.2 Classical Test-Theory item analysis: Determine discriminating power
6.2.4.3 Item-Response Theory (IRT)
6.2.4.4 Identify items for final pool
6.2.5 Revise and standardise the final version of the
measure 6.2.5.1 Revise the items and test
6.2.5.2 Select items for the final version
6.2.5.3 Refine administration instructions and scoring procedures
6.2.5.4 Administer the final version
6.2.6 Technical evaluation and establishing norms
6.2.6.1 Establish validity and reliability
6.2.6.2 Establish norms, set performance standards or
cut-scores 6.2.7 Publish and refine continuously
6.2.7.1 Compile the test manual
6.2.7.2 Submit the measure for classification
6.2.7.3 Publish and market the measure
6.2.7.4 Revise and refine continuously
6.3 Evaluating a measure
6.4 Conclusion
7 Cross-cultural test adaptation, translation, and tests in multiple
languages 7.1 Introduction
7.2 Reasons for adapting measures
7.3 Test adaptation
7.3.1 Types of test adaptation
7.3.2 Challenges related to test adaptation in South Africa
7.4 Equivalence of adapted versions of measures used in multicultural and multilingual
contexts 7.4.1 Equivalence of measures adapted for cross-cultural use
7.4.2 Exploring linguistic equivalence
7.4.2.1 Judgemental approaches
7.4.2.2 Empirical investigations
7.5 Statistical approaches to establish equivalence
7.5.1 Bias and equivalence
7.5.2 Measurement bias and equivalence
7.5.2.1 Bias and equivalence at the item level
7.5.2.2 Bias and equivalence in factor structures between groups
7.5.2.3 Evaluation of measurement invariance (MI) approaches
7.5.3 Prediction bias and equivalence
7.5.3.1 Procedures to establish prediction bias and equivalence
7.5.4 Construct bias and construct equivalence
7.5.5 Nomological bias
7.5.6 Method bias
7.6 Steps for maximising success in test adaptations
7.7 Multicultural test development
7.8 Conclusion

Part 2 Assessment Practice Zone

8 The practice of psychological assessment: Controlling the use of


measures, competing values, and ethical practice standards
8.1 Introduction
8.2 Statutory control of the use of psychological assessment measures in South
Africa 8.2.1 Why should the use of assessment measures be controlled?
8.2.2 How control over the use of psychological assessment measures is
exercised in South Africa
8.2.2.1 Statutory control
8.2.2.2 The different categories of psychology professionals who
may use psychological measures
8.2.2.3 Other professionals who use psychological measures
8.2.2.4 The classification and evaluation of psychological measures
8.2.2.5 The Professional Board for Psychology and the protection of the public
8.3 Fair and ethical assessment practices
8.3.1 What constitutes fair and ethical assessment practices?
8.3.2 Why assessment practitioners need to ensure that their assessment practices are
ethical
8.3.3 Professional practices that assessment practitioners should follow
8.3.4 Rights and responsibilities of test-takers
8.3.5 Preparing test-takers
8.3.6 Ethical dilemmas
8.4 Multiple constituents and competing values in the
practice of assessment
8.4.1 Multiple constituency model
8.4.2 Competing values: An example from industry
8.4.3 Responsibilities of organisations as regards fair assessment practices
8.5 Conclusion

9 Administering psychological assessment


measures 9.1 Introduction
9.2 Preparation prior to the assessment session
9.2.1 Selecting measures to include in a test battery
9.2.2 Checking assessment materials and equipment
9.2.3 Becoming familiar with assessment measures and instructions
9.2.4 Checking that assessment conditions will be satisfactory
9.2.4.1 Assessment rooms
9.2.4.2 Minimising cheating during group assessment and using
assessment assistants
9.2.5 Personal circumstances of the test-taker and the timing of assessment
9.2.6 Planning the sequence of assessment measures and the length of assessment sessions
9.2.7 Planning how to address linguistic factors
9.2.8 Planning how to address test sophistication
9.2.9 Informed consent
9.3 The assessment practitioner’s duties during assessment administration
9.3.1 The relationship between the assessment practitioner and the test-taker
9.3.1.1 Adopting a scientific attitude
9.3.1.2 Exercising control over groups during group assessment
9.3.1.3 Motivating test-takers
9.3.1.4 Establishing rapport
9.3.2 Dealing with assessment anxiety
9.3.3 Providing assessment instructions
9.3.4 Adhering to time limits
9.3.5 Managing irregularities
9.3.6 Recording assessment behaviour
9.3.7 Specific suggestions for assessing young children and individuals with
physical and mental disabilities
9.3.7.1 Specific suggestions for assessing young children
9.3.7.2 Assessment of individuals with physical disabilities
9.3.7.3 Assessment of mental disability
9.4 The assessment practitioner’s duties after assessment administration
9.4.1 Collecting and securing assessment materials
9.4.2 Recording process notes, scoring, and interpreting the assessment measures
9.5 Conclusion

Part 3 Types of Measures Zone

10 Assessment of cognitive
functioning 10.1 Introduction
10.2 Theories of intelligence: A brief history and
overview 10.2.1 Background
10.2.2 Defining intelligence
10.2.3 Theories of intelligence
10.3 Interpreting the intelligence score and diversity issues
10.3.1 The meaning of the intelligence score
10.3.2 Individual differences and cultural diversity
10.4 Measures of general cognitive functioning
10.4.1 Individual intelligence measures
10.4.1.1 Description and aim
10.4.1.2 The application of results
10.4.2 Group tests of intelligence
10.4.2.1 Description and aim
10.4.2.2 The application of results
10.5 Measures of specific abilities
10.5.1 Aptitude measures
10.5.1.1 Description and aim
10.5.1.2 The application of results
10.5.2 Measures of specific cognitive functions
10.5.2.1 Description and aim
10.5.2.2 The application of results
10.6 Scholastic tests
10.7 Current and new trends in cognitive assessment
10.8 Conclusion
11 Measures of well-
being 11.1 Introduction
11.2 Well-being
11.2.1 Defining well-being
11.2.2 Positive psychology and well-being
11.2.3 Domains of well-being
11.2.3.1 Subjective well-being
11.2.3.2 Other dimensions of well-being
11.2.3.3 Mental illness
11.3 Well-being in the workplace
11.3.1 Why well-being matters
11.3.2 The cost of ill health
11.3.3 Well-being in the workplace
11.3.4 Wellness programmes
11.4 Well-being among university students
11.5 Measures of well-being
11.5.1 Assessment of well-being in diverse contexts
11.5.2 Assessment of well-being in the work context
11.6 Conclusion
12 Personality assessment
12.1 Introduction
12.2 A conceptual scheme for personality assessment
12.3 Level 1: The assessment of relatively stable personality traits
12.3.1 The 16 Personality Factor Questionnaire (16PF)
12.3.2 The Big Five model of personality traits
12.3.3 The Basic Traits Inventory
12.3.4 The Hogan Personality Inventory
12.3.5 The Occupational Personality Questionnaire
12.3.6 The Myers-Briggs Type Indicator
12.3.7 The Minnesota Multiphasic Personality Inventory (MMPI)
12.3.8 Cross-cultural use of structured personality assessment measures
12.3.9 The South African Personality Inventory (SAPI)
12.4 Level 2: Assessment of motives and personal concerns
12.4.1 Explicit motives
12.4.2 Implicit motives
12.4.3 Other projective methods of personality assessment
12.5 Personality and counterproductive behaviour at work
12.5.1 Work-related Risk and Integrity Scales (WRISc)
12.5.2 Hogan Development Survey
12.6 Uses and applications of personality tests
12.7 Conclusion
13 Career-counselling
assessment 13.1 Introduction
13.2 The person-environment fit approach
13.2.1 Assessing intelligence
13.2.2 Assessing aptitude
13.2.3 Assessing vocational interests
13.2.4 Assessing personality
13.2.5 Assessing values
13.2.6 Evaluation of the person-environment fit approach to career counselling
assessment 13.3 The developmental approach to career counselling assessment
13.3.1 Career assessment in the developmental approach
13.3.2 Evaluation of the developmental approach to career counselling
assessment 13.4 The systems approach to career counselling assessment
13.4.1 Career assessment in the systems approach
13.4.2 Evaluation of the systems approach to career counselling assessment
13.5 Career construction theory and life-design counselling
13.5.1 Assessment in career construction theory and life-design counselling
13.5.2 Evaluation of career construction theory and life-design counselling
13.6 Career counselling in a changing environment
13.7 Conclusion

14 Computer-based and Internet-delivered


assessment 14.1 Introduction
14.2 Some terms explained
14.3 Computer-based and Internet-delivered assessments in
South Africa 14.3.1 Early South African developments
14.3.2 Technical limitations of early testing systems
14.3.3 A wider choice of tests becomes available
14.4 Computer-based testing in the twenty-first century
14.4.1 A wider variety of tests
14.4.2 Internet-delivered testing and cloud storage
14.4.3 Changes in the size and scope of assessment projects
14.4.4 Changes in reporting and interpretation
14.4.5 New types of assessments
14.4.6 Non-commercial resources become available for test developers and users
14.4.7 Artificial intelligence
14.4.8 Summary
14.5 Ethical and legal use of computer-based and Internet-delivered testing In South Africa
14.6 Conclusion

Part 4 Assessment Practice Zone (2)

15 The use of assessment measures in various applied


contexts 15.1 Introduction
15.2 Assessment in industry
15.2.1 Assessment of individuals
15.2.1.1 Personnel selection
15.2.1.2 Performance ratings or assessment
15.2.1.3 Interviews
15.2.1.4 Situational tests
15.2.2 Assessment of workgroups or work teams
15.2.3 Assessment of organisations
15.2.4 Organisational research opportunities
15.3 Infant and preschool developmental assessment
15.3.1 Why do we assess development?
15.3.2 What functions do we assess?
15.3.3 How do we assess?
15.3.4 What developmental measures are available for South African children?
15.3.4.1 Global developmental screening measures
15.3.4.2 Educationally focused screening measures
15.3.4.3 Diagnostic measures
15.3.4.4 Shortcomings of developmental assessment
15.3.4.5 Information from other sources and observations
15.4 The role of psychological assessment in education
15.4.1 The uses of psychological measures in educational settings
15.4.1.1 Uses for school-age learners
15.4.1.2 Uses for educational accountability purposes
15.4.2 Types of measures used in educational settings
15.4.2.1 Achievement measures
15.4.2.2 Aptitude measures
15.4.2.3 General and specific cognitive measures
15.4.2.4 Personality-related measures
15.4.2.5 Dynamic assessment
15.4.3 Uses of tests in higher-education contexts
15.5 Psychodiagnostic assessment
15.5.1 What is psychodiagnostic assessment?
15.5.2 Assessment measures and procedures
15.5.2.1 Interviews
15.5.2.2 Psychological assessment measures
15.5.3 Psychological knowledge and expertise
15.5.4 Psycholegal or forensic assessment
15.5.5 Limitations of psychodiagnostic assessment
15.6 Conclusion

16 Interpreting and reporting assessment


results 16.1 Introduction
16.2 Interpretation
16.2.1 The relation between interpretation and validity
16.2.1.1 Descriptive interpretation
16.2.1.2 Causal interpretation
16.2.1.3 Predictive interpretation
16.2.1.4 Evaluative interpretation
16.2.2 Methods of interpretation
16.2.2.1 Mechanical interpretation of assessment results
16.2.2.2 Non-mechanical interpretation of assessment results
16.2.2.3 Combining mechanical and non-mechanical approaches
16.2.3 Interpretation of norm-referenced tests
16.2.4 Interpretation of criterion-referenced measures
16.3 Principles for conveying test results
16.3.1 Ethical considerations
16.3.1.1 Confidentiality
16.3.1.2 Accountability
16.3.2 Methods of conveying assessment results
16.3.3 Conveying children’s assessment results
16.4 Reporting assessment results in written form
16.5 Conclusion

17 Factors affecting assessment


results 17.1 Introduction
17.2 Viewing assessment results in context
17.2.1 The biological context
17.2.1.1 Age-related changes
17.2.1.2 Physical impairments
17.2.2 The intrapsychic context
17.2.2.1 Transient conditions
17.2.2.2 Psychopathology
17.2.3 The social context
17.2.3.1 Schooling
17.2.3.2 Language
17.2.3.3 Culture
17.2.3.4 Environmental factors
17.2.3.5 Test-wiseness
17.3 Methodological considerations
17.3.1 Test administration and standardised procedures
17.3.2 Interpreting patterns in test scores
17.3.3 The influence of the assessment practitioner
17.3.4 Status of the test-taker
17.3.4.1 Anxiety, test-taking effort, and motivation
17.3.4.2 Faking bad, malingering, and faking good
17.3.4.3 Cheating
17.3.4.4 Practice effects
17.3.5 Bias and construct validity
17.4 Conclusion

18 On the road to culturally meaningful psychological assessment in


South Africa 18.1 Introduction
18.2 Culturally meaningful psychological assessment: The journey so far
18.2.1 Conceptualising how to combine African-centred and Western-oriented approaches
18.2.2 Progress in developing indigenous, African-centred measures
18.2.3 Progress in adapting and researching Western-oriented measures
for the South African context
18.3 Roadmap for the next phase of the journey
18.3.1 Culturally appropriate measures
18.3.1.1 Routes to developing indigenous measures and adapting
Western-oriented measures
18.3.1.2 Inspiration for African-centred item development
18.3.1.3 Delineating norm groups
18.3.1.4 Qualitative measures
18.3.1.5 Creating repositories of research studies into assessment measures
18.3.2 Capacity-building in test development and adaptation
18.3.3 Foster good ethical practices and address assessment misuse
18.4 Conclusion
LIST OF CONTRIBUTORS
• Prof. Fatima Abrahams, Department of Industrial Psychology,
University of the Western Cape (UWC).
• Prof. Mariéde Beer, Professor Emeritus retired from the
Department of Industrial & Organisational Psychology, University
of South Africa (UNISA).
• Prof. Gideon P. de Bruin, Professor in the Department of
Industrial Psychology, Stellenbosch University (US).
• Dr Karina de Bruin, Managing Director, JvR Academy, and
Research Associate in the Department of Psychology, University
of the Free State (UFS).
• Prof. Francois De Kock, Associate Professor in the Section of
Organisational Psychology, University of Cape Town (UCT).
• Prof. Cheryl Foxcroft, Dean: Teaching and Learning and
Professor of Psychology, Nelson Mandela University.
• Dr Carolina Henn, Senior Lecturer in the Department of Industrial
Psychology and People Management, University of Johannesburg
(UJ).
• Dr Loura Griessel, Senior Lecturer in the Department of Industrial
Psychology, University of the Free State (UFS).
• Prof. Kate Grieve, Former Associate Professor in the Department
of Psychology, University of South Africa (UNISA), and Clinical
Psychologist (area of interest: Neuropsychology).
• Prof. Gert Roodt, former Vice-dean: Research in the previous
Faculty of Management, University of Johannesburg (UJ).
• Prof. Louise Stroud, Professor in the Psychology Department
and Director: School of Behavioural Sciences and School of
Lifestyle Sciences, Nelson Mandela University.
• Dr Nicola Taylor, Director of Research at JvR Psychometrics,
Randburg.
• Nanette Tredoux, CEO of Psytech South Africa, former Senior
Research Specialist at the Human Sciences Research Council
specialising in computerised assessments.
• Prof. René van Eeden, Professor in the Department of
Psychology, University of South Africa (UNISA).

The editors would like to thank the following authors for their
contributions to previous editions of Introduction to Psychological
Assessment: Prof. Diane Elkonin, Caroline Davies, Dr Jenny
Jansen, and Prof. Anil Kanjee.
INTRODUCTION
The start of an exciting journey
Hi there! You are about to embark on an interesting, challenging
journey that will introduce you to the world of psychological
assessment. Interesting? Challenging? ‘That’s not what I have heard
about assessment,’ you might say. Many people view courses and
books on psychological assessment as dull, boring, and not relevant
for students and professionals in people-oriented careers. Even if you
have some doubts that you are going to enjoy learning about
assessment, we beg you to give us a chance to share our enthusiasm
about assessment with you and to show you why our enthusiasm is
shared by countless others in South Africa and, indeed, all over the
world. As you read this book, we hope that you will discover that
assessment is an essential, integral part of psychology and other
people-oriented professions. You will be exposed to the fact that
assessment can be used in many different contexts, such as
educational, counselling, clinical, psychodiagnostic,
psycholegal/forensic, industrial (occupational), and research contexts.
Furthermore, you will discover that assessment can be used for many
different purposes (e.g. diagnosing learning problems, assessing
whether a mass murderer is fit to stand trial, determining whether
therapeutic intervention has been successful, and making job
selection, placement, and training decisions). Does assessment still
sound dull, boring, and irrelevant to you? No ways! Well then, let the
journey begin.

Requirements for our journey


Whenever we set off on a journey, we need to make sure that we
have a suitable mode of transport, a map, and other necessities such
as fuel, refreshments, a first-aid kit, a reliable spare tyre, and so on.
For your journey into psychological assessment territory:
• your vehicle will be the contents of this book, which have been
carefully put together by a number of prominent people in the
assessment field in South Africa
• you will be provided with a map, so that you have some idea of the
places that we will visit and how those places can be grouped
together
• a set of signposts will be provided to help you to identify
important landmarks along the way and to correlate the
information that you obtain in the various chapters
• you have to bring along an enquiring, open mind as you will find
that we are going to challenge you to think about what you are
reading and to develop personal insights
• you will need to keep a pen and a
notebook/laptop/netbook/iPad/tablet handy at all times so that
you can record your responses to challenging questions and case
studies, as well as document your growing understanding of
psychological assessment.

The map for our journey


You will notice that the map has been divided into three zones, which
provide a logical grouping of the various stops (or chapters) that you
will make along the way. This means that instead of thinking of this as
a book (or journey) with 18 chapters (or stops), you should rather
visualise three major focus points in your journey, as this makes it
more manageable:
• In the Foundation Zone, which comprises Chapters 1 to 7, we will
introduce you to the basic concepts and language of assessment
as well as to how measures are developed and adapted so that
they are culturally appropriate.
• In the Assessment Practice Zone, which comprises Chapters 8
to 9 and 15 to 18, we will tackle various issues concerning who
may use measures, good assessment practices related to
administration, scoring, interpretation and reporting, the contexts in
which assessment is used, and the progress being made in terms
of psychological assessment becoming more culturally meaningful
in South Africa.
• In the Types of Measure Zone, which comprises Chapters 10 to
14, we will focus on the different types of measures that are used
to measure various aspects of human behaviour and functioning
as well as on computer-based and Internet-delivered testing.

Please note that we indicated only key aspects of chapter titles on


the map (e.g. 1 Assessment overview). An important aspect of your
journey is the speed at which you travel. Take things at a pace that
you find comfortable. Speeding could result in a failure to grasp some
of the essential concepts. Do not be alarmed if you find that you
sometimes need to retrace your steps, especially when you travel
through the Foundation Zone. It is better to revisit any concepts that
you find difficult to understand in the early part of the book, as the
remainder of the book builds on the knowledge that you acquire in
the Foundation Zone.

Signposts for our journey


As you travel through each chapter you will find a number of
signposts.
Chapter outcomes
Before you are provided with any material, you will be given an
overview of the out comes of the chapter. This will enable you to work
through the contents of the chapter in a focused way.

Concepts
To help you to identify the really important concepts in a chapter, they
will always be presented in boldface or italics.

Critical thinking challenge and case studies


At an appropriate point in each chapter, you will find a case study, or
something to get you thinking. The purpose of presenting you with this
material is to encourage you to think critically about the material and to
apply your new knowledge about assessment to real-life situations. We
will provide you with some cues to help you with some of the critical
thinking challenges and case studies. Although each chapter includes a
case study, you should browse through some of the other chapters when
answering the questions posed for a case study. Furthermore, as the
case studies focus on ethical dilemmas, you should particularly consult
the Ethical Rules of Conduct for Practitioners Registered under the
Health Professions Act, 1974
(http://www.hpcsa.co.za/downloads/conduct_ethics/rules/ethical_rules_
when working through the case studies.

Check your progress


At the end of every chapter you will find a few questions that you
should try and answer. This will give you a good indication of whether
you understood all the concepts and issues explained in the chapter.

Web sites and advanced readings


Something else that is important is to note that we have included
many references to work published in South African journals. We
would encourage you to access the relevant articles as they will
expand on what has been included in this book. In addition, many
chapters include references to advanced readings that you could
consult to read further on a topic. Furthermore, there are a number of
Web sites that could be accessed where you can get up to date
knowledge and information about psychological assessment and
testing. For example:
• http://en.wikipedia.org
• www.psychtesting.org.uk
• www.intestcom.org
• https://www.intestcom.org/page/4 and
https://www.intestcom.org/page/5 (for online readings in testing
and assessment)
• http://www.apa.org/topics/testing/index.aspx
• www.psyssa.com
• www.siopsa.org.za

We hope that you find your journey through this book as interesting
as we found the conceptualisation and preparation stages. We would
also like to thank all those who contributed to making this book a
reality, either by way of writing sections or on the editorial,
typesetting, and production side. The contributors, in particular, need
to be thanked for their time and effort in preparing their insightful,
scholarly contributions.
Finally, we would like you to let us know your impressions of this
book, and any suggestions you have for improving it. Send your
comments to Cheryl at the following email address:
Cheryl.Foxcroft@mandela.ac.za.

Cheryl Foxcroft and Gert Roodt


Foundation Zone
When you look at the map that guides your journey through this book, you
probably don’t realise that the first zone involves climbing up a rather
steep hill. The reason why the hill is so steep is that you will be introduced
to most of the important concepts in psychological assessment and
measurement in this zone. These concepts are not always easy to
understand, so we suggest that you proceed slowly through this zone. Do
not be afraid to turn back and retrace your steps if you need to.

1 Assessment overview
Part 2 Historical perspectives
3 Basic measurement concepts
1 4 Reliability
5 Validity
6 Developing a measure
7 Cross-cultural perspectives
An overview of assessment:
Definition and scope
CHERYL FOXCROFT AND GERT ROODT
Chapter CHAPTER OUTCOMES
1 By the end of this chapter you will be able to:
❱ distinguish between tests, assessment measures, testing and
psychological and competency-based assessment
❱ name the characteristics of assessment measures
❱ understand the value of adopting an Africa-centred approach
in South Africa
❱ recognise the limitations of assessment results
❱ explain assessment as a multidimensional process.

1.1 Introduction
We constantly have to make decisions in our everyday lives: what to
study, what career to pursue, whom to choose as our life partner,
which applicant to hire, how to put together an effective team to
perform a specific task, and so on. This book will focus on the role of
psychological assessment in providing information to guide
individuals, groups, and organisations to understand aspects of their
behaviour and make informed and appropriate decisions. In the
process you will discover that assessment can serve many purposes.
For example, assessment can help to identify strengths and
weaknesses, map development or progress, inform decisions
regarding suitability for a job or a field of study, identify training and
education needs, or it can assist in making a diagnosis. Assessment
can also assist in identifying intervention and therapy needs,
measuring the effectiveness of an intervention programme, and in
gathering research data to increase psychology’s knowledge base
about human behaviour or to inform policy-making.
South Africa is situated in Africa, and professionals and
researchers are increasingly recognising the importance and value of
adopting a more African-centred approach. As South African society
is multicultural and multilingual in nature, this book also aims to raise
awareness about the role of culture and language in assessment, and
to suggest ways of addressing them from an African-centred
perspective. As we begin our journey in this chapter you will be
introduced to key terms, the characteristics of assessment measures,
the multidimensional nature of the assessment process, and the
limitations of assessment results.

1.2 About tests, testing, and assessment


One of the first things that you will discover when you start journeying
through the field of psychological assessment is that many confusing
and overlapping terms are used. As you travel through the Foundation
Zone of this book, you must try to understand the more important
terms and think about how they are interlinked.
In essence, tools are available to make it possible for us to assess
(measure) human behaviour. You will soon realise that various names
are used to refer to these tools such as tests, measures,
assessment measures, instruments, scales, procedures, and
techniques. To ensure that the measurement is valid and reliable, a
body of theory and research regarding the scientific measurement
principles that are applied to the measurement of psychological
characteristics has evolved over time. This subfield of psychology is
known as psychometrics, and you will often hear of the
psychometric properties of a measure or the term psychometric
measure. Psychometrics refers to the systematic and scientific way
in which psychological measures are developed and the technical
measurement standards (e.g. validity and reliability) required of such
measures.
When we perform an assessment, we normally use assessment
tools as well as other information that we obtain about a person (e.g.
school performance, qualifications, life and work history, family
background). Psychological assessment is a process-orientated
activity aimed at gathering a wide array of information by using
psychological assessment measures (tests) and information from
many other sources (e.g. interviews, a person’s history, collateral
sources). We then evaluate and integrate all this information to reach
a conclusion or make a decision. Seen from this perspective, testing
(i.e. the use of tests, measures), which involves the measurement of
behaviour, is one of the key elements of the much broader evaluative
process known as psychological assessment.
Traditionally, psychology professionals have performed all aspects
of the assessment process (test selection and administration, scoring,
interpreting and reporting/providing feedback). Furthermore, as the
outputs of psychological assessment are in the form of psychological
traits/constructs (such as personality and ability), the expertise to
perform psychological assessment is a core competence of an
appropriately registered psychology professional. However, in the
modern testing era, there has been a shift in the role of the
psychology professional in the assessment process in the work and
organisational context in particular (Bartram, 2003). This shift has
been due to the advent of Internet-delivered computer-based tests
as well as the growing impact of technological developments on
testing (see Chapter 14) and competency-based assessment (see
Section 2.3.6, which focuses on assessing the skills, behaviours,
attitudes/values required for effective performance in the workplace or
educational/training settings, the results of which are directly linked to
the competency language of the workplace or educational setting).
These factors have redefined who a test user is and the role of the
psychology professional in assessment, given that computer-based
tests require little, if any, human intervention in terms of administration
and scoring, and computer-generated test interpretation and reports
are often available. This results in some test users in the workplace
only needing the skill to use and apply the competency-based outputs
from a test, which does not require extensive psychological training.
There is much controversy around the shifting role of the psychology
professional and the role of non-professionals in assessment in some
areas of psychology. This controversy will be debated further in
Chapters 2, 8 and 14 and has even been raised in legal cases (e.g.
Association of Test Publishers of South Africa and Another, Savalle
and Holdsworth South Africa v. The Chairperson of the Professional
Board of Psychology, unreported case number 12942/09, North
Gauteng High Court, 2010).
Having summarised the relationship between tests, testing, and
assessment, let us now consider each aspect in more depth.
However, you should be aware that it is almost impossible to define
terms such as ‘test’, ‘testing’, ‘assessment measure’, and
‘assessment’.

Any attempt to provide a precise definition of ‘test’, or of ‘testing’ as a


process, is likely to fail as it will tend to exclude some procedures that
should be included and include others that should be excluded
(International Test Commission, 1999, p. 8).

Instead of focusing on specific definitions, we will highlight the most


important characteristics of assessment measures (tests) and
assessment.

1.3 Assessment measures and tests


1.3.1 Characteristics of assessment measures
Although various names are used to refer to the tools of assessment,
this book will mainly use assessment measure and test. Preference
is given to the term assessment measure as it has a broader
connotation than the term test, which mainly refers to an objective,
standardised measure that is used to gather data for a specific
purpose (e.g. to determine what a person’s intellectual capacity is).
The main characteristics of assessment measures are as follows:
Assessment measures include many different procedures that can be
used in psychological, occupational, and educational assessment and
can be administered to individuals, groups, and organisations. Specific
domains of functioning (e.g. intellectual ability, personality,
organisational climate) are sampled by assessment measures. From
these samples, inferences can be made about both normal and
abnormal (dysfunctional) behaviour or functioning.
Assessment measures are administered under carefully controlled
(standardised) conditions.
Systematic methods are applied to score or evaluate assessment
protocols.
Guidelines are available to understand and interpret the results of an
assessment measure. Such guidelines may make provision for the
comparison of an individual’s performance to that of an appropriate
norm group or to a criterion (e.g. a competency profile for a job), or
may outline how to use test scores for more qualitative classification
purposes (e.g. into personality types or diagnostic categories).
Assessment measures should be supported by evidence that they
are valid and reliable for the intended purpose. This evidence is
usually provided in the form of a technical test manual.
Assessment measures are usually developed in a certain context
(society or culture) for a specific purpose and the normative
information used to interpret test performance is limited to the
characteristics of the normative sample. Consequently, the
appropriateness of an assessment measure for an individual, group,
or organisation from another context, culture, or society cannot be
assumed without an investigation into possible test bias (i.e. whether
a measure is differentially valid for different subgroups) and without
strong consideration being given to adapting and re-norming the
measure. In Chapter 7 you can read about cross-cultural test
adaptation. Furthermore, in the historical overview of assessment
provided in Chapter 2, you will see what a thorny issue the cross-
cultural transportability of measures has become, especially in the
South African context.
Assessment measures vary in terms of:
– how they are administered (e.g. group, individual, or on a
computing device)
– whether time limits are imposed. In a speed measure there is a
large number of fairly easy items of a similar level of difficulty.
These need to be completed within a certain time limit, with the
result that almost no one completes all the items in the specified
time. In power measures on the other hand, time limits are not
imposed so that all test-takers may complete all the items.
However, the items in a power measure get progressively more
difficult
– how they are scored (e.g. objectively with scoring masks or more
subjectively according to certain guidelines)
– how they are normed (e.g. by using a comparison group or a
criterion)
– what their intended purpose is (e.g. screening versus diagnostic,
competency-based testing)
– the nature of their items (e.g. verbal items, performance tasks)
– the response required from the test-taker (e.g. verbally, via
pencil and paper, by manipulating physical objects, by pressing
keys on a computer keyboard or touching elements on a screen)
– the content areas that they tap (e.g. ability or personality-related).

All the characteristics of measures outlined above will be amplified


throughout this book. Furthermore, various ways of classifying
assessment measures are available. In Chapter 8 we discuss the
classification system used in South Africa as well as the criteria used
in deciding whether a measure should be classified as a psychological
test or not. You will also notice that the different types of measures
presented in Chapters 10 to 14 have largely been grouped according
to content areas.

1.3.2 Important issues


There are two important aspects about assessment results that you
should always keep in mind. Firstly, test results represent only one
source of information in the assessment process. Unfortunately,
assessment measures, because they offer the promise of objective
measurement, often take on magical proportions for assessment
practitioners who begin to value them above their professional
judgement or opinion. Lezak (1987) reminds us that psychological
assessment measures:

are simply a means of enhancing (refining, standardising) our


observations. They can be thought of as extensions of our organs of
perception … If we use them properly … they can enable us to
accomplish much more with greater speed. When tests are misused as
substitutes for, rather than extensions of, clinical observation, they can
obscure our view of the patient (Lezak, 1987, p. 46).

Secondly, we need to recognise the approximate nature of


assessment (test) results. Why? The results obtained from
assessment measures always need to be bracketed by a band of
uncertainty because errors of measurement creep in during
administration, scoring, and interpretation. Furthermore, the social,
economic, educational, and cultural background of an individual can
influence his or her performance on a measure to the extent that the
results present a distorted picture of the individual (see Chapter 17).
Thorndike and Hagen (1977), however, point out that it is not just in
the field of psychology where assessment information may be subject
to error. The teacher’s perception of how well a child reads, the
medical doctor’s appraisal of a person’s health status, the social
worker’s description of a home environment, and so on, all represent
approximate, informed, yet somewhat subjective, and thus potentially
fallible (incorrect), opinions.

1.4 The assessment process


The assessment process is multidimensional in nature. It entails the
gathering and synthesising of information as a means of describing
and understanding functioning. This can inform appropriate decision-
making and intervention.
The information-gathering aspect of the assessment process will
be briefly considered. ‘Test performance in a controlled clinic situation
with one person is not a representative sample of behavior!’ (Bagnato
& Neisworth, 1991, p. 59). Information-gathering itself must be
multidimensional. Table 1.1 highlights the varied aspects across which
information could be purposefully gathered, with the purpose of the
assessment determining which aspects are appropriate.
Table 1.1 Multidimensional information-gathering

SOURCES OF INFORMATION EXAMPLES

Multiple measures Different types of assessment


measures such as norm-based and
criterion-referenced tests,
interviews, behavioural
observation, rating scales
completed by teachers or
supervisors, and ecologically-
based measures that describe the
social or occupational context of
an individual could be used.

Multiple domains The following could be assessed,


for example: attention; motor,
cognitive, language-related, non-
verbal, and personality-related
functioning; interest; scholastic
achievement; and job
performance.

Multiple sources Consult with other professionals,


teachers, parents, extended family
members, and employers.

Multiple settings Assessment could take place in a


variety of settings (e.g. home,
school, work, consulting rooms)
and social arrangements (e.g. one-
to-one, with peers, with parents
present) to get as broad a
perspective as possible of a
person’s functioning and the
factors that influence it.

Multiple occasions For assessment to be relevant,


valid, and accurate, patterns of
functioning have to be identified
over a period of time and not
merely in a single assessment
session.

By gathering a wide array of data in the assessment process, a


richer and broader sampling of behaviour or functioning can be
achieved. However, the assessment battery (i.e. the combination of
measures used) must be tailored to an individual, group, or
organisation’s needs (e.g. in terms of their age, level of ability and
disability, capacity, job analysis) as well as to the purpose of the
assessment. You can read more about the tailoring of assessment
and the selection of tests for a battery in Chapter 9.
After gathering the information, all the information must be
synthesised, clustered together, and weighed up in culturally-sensitive
ways to describe and understand the functioning of an individual,
group, or organisation. Based on such descriptions, predictions can be
made about future functioning, decisions can be made, interventions
can be planned, and progress can be mapped, among other things.
It is particularly in the synthesis and integration of assessment
information that much skill and professional judgment is required to
identify the underlying patterns of behaviour and to make appropriate
deductions. This is why you need to draw on all your knowledge from
all areas of psychology, and not just from the field of assessment,
when you perform a psychological assessment.
However, it is important to recognise the limits of human wisdom
when reaching opinions based on assessment information. Why?
When assessment practitioners synthesise and integrate assessment
information, they do so in as professional and responsible a way as
possible, using all the wisdom at their disposal. At the end of this
process, they formulate an informed professional opinion. Even
though it is an informed opinion, it is nonetheless an opinion, which
may or may not be correct. The assessment practitioner may be fairly
certain that the correct opinion has been arrived at, but absolute
certainty is not possible. Nonetheless, increasing emphasis is being
placed on the consequences of the outcomes of assessment for
individuals and organisations and the responsibility of the psychology
professional in this regard (see Chapter 8). For example, if a wrong
selection decision is made when hiring a senior-level employee,
possible negative outcomes for an organisation could be: added
financial risk, lowering of effectiveness and efficiency, bad publicity
and legal action (related to unfair labour practices if the employee is
retrenched/fired).

❱❱ CRITICAL THINKING CHALLENGE 1.1

In your own words, explain what the assessment process involves


and where assessment measures fit into the picture.

As we conclude this section, another crucial aspect of the assessment


process needs to be highlighted. Can you guess what it is? Various
parties, often with competing motives and values, are involved in the
assessment process. The person doing the assessment, the person
being assessed, and external parties such as employers, education
authorities, or parents, all have a stake in the outcome of the
assessment. Therefore, it is important that the assessment practitioner
performs assessment in a professional, ethical manner and that the
rights, roles, and responsibilities of all the stakeholders involved are
recognised and respected. This will be explored in more detail in
Chapter 8, including unpacking the Ethical Rules of Conduct for
Professionals Registered under the Health Professions Act, 1974
(Department of Health, 2006) in terms of psychological assessment.

1.5 Psychological assessment in Africa


You now have a broad understanding of some of the key terms and
the characteristics of psychological assessment measures, and know
that assessment is a process that includes, but is not limited to, the
application of psychological tests. In many ways, assessing human
behaviour is a universal phenomenon. Based on our interactions with
a person, we form a judgement or opinion of them, which leads us to
conclude that someone is ‘clever’, ‘wise’, ‘assertive’, or ‘lacks self-
confidence’. Such assessment is informal. When it comes to the
formal assessment of human behaviour using standard measures,
psychological testing and assessment is part of the universal
discipline of psychology that is practised around the world. They key,
however, is that the tests that you use need to be contextually
sensitive and relevant for the assessment to yield valid and reliable
results that can be used to make decisions, plan interventions, and so
on.
In Chapter 2 we trace the history of the development of
psychological testing universally and in South Africa. As you will see,
psychological testing as we know it today largely originated in Europe
and America, and reached the shores of Africa when, for example,
countries were colonised, or when African scholars studied and
trained abroad in the colonial era and brought Westernised measures
back with them. For too long, Westernised assessment approaches
and measures were uncritically transported into the African context.
Fortunately, the last decade has seen an increase in emphasising the
necessity of exploring indigenous cultures’ understandings of
psychological constructs, and how assessment is approached, to
inform the development of indigenous measures or the adaptation of
Western-orientated assessment measures so that they are culturally
appropriate and produce valid assessment results in the multicultural
African context (e.g. Mkhize, 2004; Moletsane, 2016; Mpofu, Ntinda &
Oakland, 2012; Zuilkowski, McCoy, Serpell, Matafwali & Fink, 2016).
Nsamenang (2007, p.23) argues that in doing so we will develop a
psychology that is ‘truly tuned to Africa, and sensitive and useful to
African needs’.
Throughout this book the progress being made in adopting an
African-centred approach during the development and adaptation of
tests and approaches, test administration, and the culturally sensitive
interpretation of test results is stressed. Note this progress as you
work through each chapter. You can then compare your notes to the
summary of the progress made and the road still to be travelled that is
provided in Chapter 18.
However, adopting an African-centred approach does not exclude
drawing on and contributing to the Western-orientated psychological
assessment knowledge base. Why? Given ‘the fluidity and changing
nature of communities and cultures in the global context’ (Van der
Merwe, Ntinda, & Mpofu, 2016, p. 40), African communities also find
themselves in transition. Increased globalisation leads to greater
acceptance of diverse cultures and a fusion of Western- and African-
based views, values, and ideas by African youth, for example (Mkhize,
2004; Ratele, 2004). To best cater for the diversity of African people,
we need to adopt an integration of African- and Western-centred
approaches to enrich all aspects of psychological assessment
(Moletsane, 2016; Mpofu, Ntinda, & Oakland, 2012; Zuilkowski, et al.,
2016). Furthermore, African-centred psychology is part of the
universal discipline of psychology. Consequently, ‘its potential to
enrich and extend the discipline is great’ (Nsamenang, 2007, p. 32).
A list of readings is provided at the end of the chapter for you to
explore approaches to psychology that are centred in Africa in more
depth.

❱❱ CRITICAL THINKING CHALLENGE 1.2

To what extent is indigenous assessment of cognitive and personality functioning


practised in African cultures? What implications does your answer have for the
argument that we need to draw on and integrate African- and Western-orientated
approaches to psychological assessment? Some of the readings at the end of the
chapter could help you grapple with these questions.

CHECKING YOUR PROGRESS 1.1


1.1 Define the following terms:
• Psychometrics
• Psychological assessment
• Testing
• Competency-based assessment
• Assessment measure
• Assessment battery
• Integration of African- and Western-centred approaches to assessment.

Another way more fun way to check your progress is to see how you
respond to the questions posed after reading through the following
case study:

❱❱ ETHICAL DILEMMA CASE STUDY 1.1

Lindiwe, a Grade 12 learner, consults a Registered Counsellor (RC) for career


guidance. Based on an interview and the results of the Self-Directed Search
(SDS) (see Chapter 13, Section 13.2.3), the Basic Traits Inventory (Section
13.2.5.3) and the Differential Aptitude Tests Form R (Chapter 10, Section
10.5.1.1) it is found that Lindiwe has a strong interest in working in the legal
field, has an above-average verbal ability, and has the general ability to
succeed at further studies. The RC advises her to pursue a legal career and
enrol for a law degree at a university. Lindiwe does so and emails the RC four
years later with the news that she has successfully completed her studies and
has been accepted to do her articles at a reputable law firm.

Questions
(a) What would have guided the RC in deciding how to address the
referral question of providing Lindiwe with career guidance?
(Hint: what do you need to know about Lindiwe’s context?)
(b) In what ways did the RC demonstrate responsible use of
assessment measures in this case study?
(c) Is there evidence of irresponsible assessment practices in this case
study that could have resulted in negative consequences for Lindiwe?
1.6 Conclusion
This chapter introduced you to key terms in the field of psychological
assessment. You learned about the tools (assessment measures or
tests) used in psychological testing and assessment, their
characteristics, and the psychometric properties (measurement
standards) required of them. You further learned that test results are
only one source of information in an assessment process and that the
conclusions reached from them are always approximate in nature. The
multidimensional nature of the assessment process, and the
competing values and motives of those involved in the assessment
process, were also highlighted. The chapter concluded with the
proposition that for culturally appropriate assessment to be conducted
in South Africa and the African continent, a fusion of African-centred
and Western-orientated approaches is needed.

ADVANCED READING
Mkhize, N. (2004). Psychology: An African perspective. In K. Ratele, N. Duncan, D.
Hook, P. Kiguwa, N. Mkhize & A. Collins (Eds.), Self, community & psychology
(pp. 4.1–4.29). Lansdowne, South Africa: University of Cape Town Press.
Makhubela, M. (2016). ‘From psychology in Africa to African psychology’: Going
nowhere slowly. Psychology in Society (PINS), 52, pp. 1–18. Retrieved on
12 April 2018 from http://dx.doi.org/10.17159/2309-8708/2016/n52a1.
Moletsane, M. (2016). Understanding the role of indigenous knowledge in
psychological assessment and intervention in a multicultural South African
context. In R. Ferreira (Ed.), Psychological assessment: Thinking innovatively in
contexts of diversity (pp. 20−36). Cape Town: Juta and Company.
Mpofu, E., Ntinda, K. & Oakland, T. (2012). Understanding human abilities in sub-
Saharan African settings. Online Readings in Psychology and Culture, 4(3).
Retrieved on 12 April 2018 from https://doi.org/10.9707/2307-0919.1036.
Nsamenang, A.B. (2007). Origins and development of scientific
psychology in Afrique Noire. In M.J. Stevens and D. Wedding (Eds.,
under the supervision of John G. Adair). Psychology: IUPsyS Global
Resource (Edition 2007). London: Psychology Press.
Nwoye, A. (2015). What is African psychology the psychology of?
Theory & Psychology, 25(1), pp. 96–116.
Ratele, K. (2017). Four (African) psychologies. Theory &
Psychology, 27(3), pp. 313– 327.
Valchev, V.H., Van de Vijver, F.J.R., Nel, J.A., Rothman, S. & Meiring,
D. (2013). The use of traits and contextual information in free
personality descriptions across ethnocultural groups in South Africa.
Journal of Personality and Social Psychology, 104, pp. 1077−1091.
(Can be accessed online at http://dx.doi.org/10.1037/a0032276).
Van der Merwe, K., Ntinda, K & Mpofu, E. (2016). African perspectives of personality
psychology. In L.J. Nicholas (Ed.), Personality Psychology (pp. 32–50). Cape
Town, South Africa: Oxford University Press Southern Africa (Pty) Ltd.
Zuilkowski, S.S., McCoy, D.C., Serpell, R., Matafwali, B. & Fink, G. (2016).
Dimensionality and the development of cognitive assessments for children in
sub-Saharan Africa. Journal of Cross-Cultural Psychology, 47(3), pp. 341–354.
(Can be accessed online at http://dx.doi.org/10.1177/0022022115624155).
Psychological assessment: A
brief retrospective overview
CHERYL FOXCROFT, GERT ROODT, AND FATIMA ABRAHAMS
Chapter CHAPTER OUTCOMES
2 By the end of this chapter you will be able to :
❱ understand how assessment has evolved since ancient times
❱ appreciate the factors that have shaped psychological
assessment in South Africa
❱ develop an argument about why assessment is still valued in
modern society.

2.1 Introduction
At the start of our journey into the field of psychological assessment, it
is important to gain a perspective on its origins. This is the focus of
this chapter. Without some idea of the historical roots of the discipline
of psychological assessment, the great progress made by modern
assessment measures cannot be fully appreciated. In this chapter you
will also be introduced to some of the key concepts that we will be
elaborating on in the Foundation Zone of this book.
As you journey through the past with us, you should be on the
lookout for the following:
how difficult it was in ancient times to find an objectively verifiable way
of measuring human attributes (ask yourself the reasons for this)
how the most obvious things in the world of ancient philosophers and
scientists (such as the human hand, head, and body, as well as
animals) were used in an attempt to describe personal attributes
the stepping stones that some of the ancient ‘measures’ provided for
the development of modern psychological assessment
the factors both within and outside of the discipline of psychology that
have shaped the development of modern psychological assessment
the factors that shaped the development and use of psychological
assessment in South Africa.

2.2 A brief overview of the early origins


of psychological assessment
The use of assessment measures can be traced back to ancient
times. One of the first recordings of the use of an assessment
procedure for selection purposes can be found in the Bible in Judges
Chapter 7, verses 1 to 8. Gideon observed how his soldiers drank
water from a river, so he could select those who remained on the alert.
Historians credit the Chinese with having a relatively sophisticated
testing programme for civil servants in place more than 4 000 years
ago (Kaplan & Saccuzzo, 2009). Oral examinations were administered
every third year, and the results were used for work evaluations and
for promotion purposes.
Over the years, many authors, philosophers, and scientists have
explored various avenues in their attempts to assess human
attributes. Let us look at a few of these.

2.2.1 Astrology
Most people are aware of the horoscopes that appear in daily
newspapers and popular magazines. The positions of planets are
used to formulate personal horoscopes that describe the personality
characteristics of individuals and to predict what might happen in their
lives. The origin of horoscopes can be traced back to ancient times,
possibly as early as the fifth century BCE (McReynolds, 1986). Davey
(1989) concludes that scientists, on the whole, have been scathing in
their rejection of astrology as a key to understanding and describing
personality characteristics. Do you agree? Briefly state your reasons.

2.2.2 Physiognomy
McReynolds (1986) credits Pythagoras for being perhaps the earliest
practitioner of physiognomy, in the sixth century BCE. Later on,
Aristotle also came out in support of physiognomy, which attempted
to judge a person’s character from the external features of the body
and especially the face, in relation to the similarity that these features
had to animals. Physiognomy was based on the assumption that
people who shared physical similarities with animals also shared
some psychic properties with these animals. For example, a person
who looked like a fox was sly, or somebody who looked like an owl
was wise (Davey, 1989). What is your view on this?

❱❱ CRITICAL THINKING CHALLENGE 2.1

Many application forms for employment positions or for furthering your studies
require that a photograph be submitted. It is highly unlikely that selection and
admission personnel use these photographs to judge personal attributes of the
applicants, as physiognomists would have done. So why do you think that
photographs are requested? Try to interview someone in the human resources
section of a company, or an admissions officer at an educational institution, to
see what purpose, if any, photographs serve on application forms.

2.2.3 Humorology
In the fifth century BCE, Hippocrates, the father of medicine,
developed the concept that there were four body humours or fluids
(blood, yellow bile, black bile, and phlegm) (McReynolds, 1986).
Galen, a physician in ancient Rome, took these ideas further by
hypothesising four types of temperament (sanguine, choleric,
melancholic, and phlegmatic), corresponding to the four humours
(Aiken & Groth-Marnat, 2005). The problem with the humoral
approach of classifying personality types into one of four categories
was that it remained a hypothesis that was never objectively verified.
Today the humoral theory mainly has historical significance.
However, based on the views of Hippocrates and Galen, Eysenck
and Eysenck (1958) embedded the four temperaments within the
introversion/extroversion and the emotionally stable/emotionally
unstable (neurotic) personality dimensions that they proposed. Of
interest is the fact that Eysenck and Eysenck’s (1958) two personality
dimensions still form the basis for modern personality measures such
as the Myers-Briggs Type Indicator and the 16 Personality Factor
Questionnaire. You can read more about these measures in Chapter
12.
2.2.4 Phrenology
Franz Gall was the founder of phrenology, the ‘science’ of ‘reading
people’s heads’ (McReynolds, 1986). Phrenologists believed that the
brain consisted of a number of organs that corresponded with various
personality characteristics (e.g. self-esteem, cautiousness, firmness)
and cognitive faculties (e.g. language, memory, calculation). By
feeling the topography of a person’s skull, phrenologists argued that it
was possible to locate ‘bumps’ over specific brain areas believed to be
associated with certain personality attributes (Aiken & Groth-Marnat,
2005). The fundamental assumptions underlying phrenology were
later demonstrated to be invalid in research studies. Consequently, no
one really places any value on phrenology today.

2.2.5 Chirology – Palmistry


Bayne (as cited in Davey, 1989) asserted that palm creases (unlike
fingerprints) can change and he found that certain changes appeared
to be related to changes in personality. He also believed that all hand
characteristics should be taken into consideration before any valid
assessments could be made. However, to this day, no scientific
evidence has been found that, for example, a firm handshake is a sign
of honesty, or that long fingers suggest an artistic temperament
(Davey, 1989).

2.2.6 Graphology
Graphology can be defined as the systematic study of handwriting.
Handwriting provides graphologists with cues that are called
‘crystallised gestures’ that can be analysed in detail. As handwriting is
a type of stylistic behaviour, there is some logic to the argument that it
could be seen to be an expression of personality characteristics.
Graphologists hypothesise that people who keep their handwriting
small are likely to be introverted, modest, humble, and shun publicity.
Large handwriting, on the other hand, shows a desire to ‘think big’,
which, if supported by intelligence and drive, provides the ingredients
for success. Upright writing is said to indicate self-reliance, poise,
calm and self-composure, reserve, and a neutral attitude (Davey,
1989). Davey (1989) concluded that efforts of graphologists to
establish validity of such claims have yielded no or very few positive
results.
Although there are almost no studies in which it has been found
that handwriting is a valid predictor of job performance, graphology is
widely used in personnel selection to this day (Simner & Goffin, 2003).
It is especially used in France, but also in other countries such as
Belgium, Germany, Italy, Israel, Great Britain, and the United States
(US). This has prompted Murphy and Davidshofer (2005) to ask why
graphology remains popular. What reasons do you think they
unearthed in attempting to answer this question?
Murphy and Davidshofer (2005) concluded that there were three
main reasons that fuelled the popularity of handwriting analysis in
personnel selection:
It has high face validity, meaning that to the ordinary person in the
street it seems reasonable that handwriting could provide indicators of
personality characteristics, just as mannerisms and facial expressions
do.
Graphologists tend to make holistic descriptions of candidates such as
‘honest’, ‘sincere’, and ‘shows insight’, which, because of their
vagueness, are difficult to prove or disprove.
• Some of the predictions of graphologists are valid. However,
Murphy and Davidshofer (2005) cite research studies that reveal
that the validity of the inferences drawn by the graphologists was
related more to what they gleaned from the content of an
applicant’s biographical essay than the analysis of the handwriting!
Despite having found reasons why graphology continues to be used in
personnel selection, Murphy and Davidshofer (2005) concluded that,
all things considered, there is not sufficient evidence to support the
use of graphology in employment testing and selection. Simner and
Goffin (2003) concur with this and argue that the criterion-related
validity of graphology is lower and more variable than that of more
widely known and less expensive measures. For example, whereas
the criterion-related validity of graphology varies between .09 and .16
(Simner & Goffin, 2003), the criterion-related validity of general mental
testing and structured interviews in job selection has been found to be
.51, & when used in combination, the validity coefficient increases to
.63 (Schmidt & Hunter, 1998). Simner and Goffin (2003) thus caution
that the continued use of graphology for personnel selection could
prove to be costly and harmful to organisations.

2.2.7 Summary
All the avenues explored by the early philosophers, writers, and
scientists did not provide verifiable ways of measuring human
attributes. The common thread running through all these attempts (but
probably not in the case of graphology), is the lack of proper scientific
method and, ultimately, rigorous scientific measurement.

2.3The development of modern


psychological assessment: An
international perspective
2.3.1 Early developments
Psychology has only started to prosper and grow as a science since
the development of the scientific method. Underlying the scientific
method is measurement. Guilford stated as long ago as 1936 that
psychologists have adopted the motto of Thorndike that ‘whatever
exists at all, exists in some amount’ and that they have also adopted
the corollary that ‘whatever exists in some amount, can be measured’
(Thorndike, 1918, p. 16). It was perhaps the development of objective
measurement that made the greatest contribution to the development
of psychology as a science.
During the Italian Renaissance Huarte’s book was translated into
English as The Tryal of Wits (1698). This book was a milestone in the
history of assessment, because for the first time someone proposed a
discipline of assessment, gave it a task to do, and offered some
suggestions on how it might proceed. Huarte pointed out that: people
differ from one another with regard to certain talents
different vocations require different sets of talents
a system should be developed to determine specific patterns of
abilities of different persons so that they can be guided into
appropriate education programmes and occupations. This system
would involve the appointment of a number of examiners (triers) who
would carry out certain procedures (tryals) in order to determine a
person’s capacity (McReynolds, 1986).
A further milestone in the development of modern psychological
assessment came from the work of Thomasius, a professor of
philosophy in Germany. According to McReynolds (1986), Thomasius
made two main contributions to the emerging field of assessment. He
was the first person to develop behavioural rating scales and,
furthermore, the ratings in his scales were primarily dependent on
direct observations of the subject’s behaviour.
Another milestone was the coining of the term psychometrics by
Wolff. This term was used throughout the eighteenth and nineteenth
centuries, but was mainly applied to psychophysical measurements
(McReynolds, 1986). In the twentieth century, with the shift towards
the measurement of individual differences, the term was applied to a
wider variety of measuring instruments, such as cognitive (mental
ability) and personality-related measures.
After the foundation had been laid by experimental psychologists
such as Wundt, the latter half of the nineteenth century saw some
promising developments in the field of assessment linked to the work
of Francis Galton, James McKeen Cattell, and Alfred Binet. One of
experimental psychology’s major contributions to the field of
psychological assessment was the notion that assessment should be
viewed in the same light as an experiment, as it required the same
rigorous control. As you will discover, one of the hallmarks of modern
psychological assessment is that assessment measures are
administered under highly standardised conditions.

2.3.2 The early twentieth century


The twentieth century witnessed genuine progress in psychological
assessment and one of the first overviews of available tests was
published by Whipple (1914). The progress made has mainly been
attributed to advances in:
theories of human behaviour that could guide the development of
assessment measures
statistical methods that aided the analysis of data obtained from
measures to determine their relationship to job performance and
achievement, for example, as well as to uncover the underlying
dimensions being tapped by a measure
the application of psychology in clinical, educational, military, and
industrial settings.
Other than these advances, there was another important impetus that
fuelled the development of modern psychological assessment
measures in the twentieth century. Do you have any idea what this
was?

During the nineteenth century and at the turn of the twentieth


century, in particular, a need arose to treat mentally disturbed
and disabled people in a more humanitarian way.

To achieve this, the mental disorders and deficiencies of patients had


to be properly assessed and classified. Uniform procedures needed to
be found to differentiate people who were mentally insane, or who
suffered from emotional disorders, from those who were mentally
disabled or suffered from an intellectual deficit. A need therefore arose
for the development of psychological assessment measures.
According to Aiken and Groth-Marnat (2005), an important
breakthrough in the development of modern psychological
assessment measures came at the start of the twentieth century. In
1904, the French Minister of Public Instruction appointed a
commission to find ways to identify mentally disabled individuals so
that they could be provided with appropriate educational opportunities.
One member of the French commission was Binet. Together with
Simon, a French physician, Binet developed the first measure that
provided a fairly practical and reliable way of measuring intelligence.
The 1905 Binet-Simon Scale became the benchmark for future
psychological tests.

The measure was given under standardised conditions (i.e. everyone


was given the same test instructions and format). Furthermore, norms
were developed, albeit using a small and unrepresentative sample.
More important than adequacy of the normative sample, though, was
Binet and Simon’s notion that the availability of comparative scores
could aid interpretation of test performance.
In developing the Binet-Simon Scale, a domain approach was
followed. That is, the scale consisted ‘of items requiring for their
solution knowledge and skills considered informative about effective
performance at school’ (Poortinga & Klieme, 2016, p. 14). Such an
approach resulted in a scale that was less likely to measure universal
aspects of cognitive functioning across cultures (Poortinga & Klieme,
2016).
It is interesting to note that one of the earliest records of the
misuse of intelligence testing involved the Binet-Simon Scale
(Gregory, 2010). An influential American psychologist, Henry
Goddard, was concerned about what he believed to be the high rate of
mental retardation among immigrants entering the US. Consequently,
Goddard’s English translation of the Binet-Simon Scale was
administered to immigrants through a translator, just after they arrived
in the US. ‘Thus, a test devised in French, then translated to English
was, in turn, retranslated back to Yiddish, Hungarian, Italian, or
Russian; administered to bewildered labourers who had just endured
an Atlantic crossing; and interpreted according to the original French
norms’ (Gregory, 2000, p. 17). Add to this that as the Scale was
domain-oriented rather than tapping universal mental processes
(Poortinga & Klieme, 2016), it was not really appropriate to apply it
cross-culturally. It is thus not surprising that Goddard found that the
average intelligence of immigrants was low!
The Binet-Simon Scale relied heavily on the verbal skills of the
test-taker and, in its early years, was available in French and English
only. Consequently, its appropriateness for use with non-French or
non-English test-takers, illiterates, and with speech- and hearing-
impaired test-takers was questioned (Gregory, 2010). This sparked
the development of a number of non-verbal measures (e.g. Seguin
Form Board Test, Knox’s Digit Symbol Substitution Test, the Kohs
Block Design Test, and the Porteus Maze Test).

World War I further fuelled the need for psychological


assessment measures.

Why? Large numbers of military recruits needed to be assessed, but


at that stage only individually administered tests, such as the Binet-
Simon scale, were available. So World War I highlighted the need for
large-scale group testing. Furthermore, the scope of testing
broadened at this time to include tests of achievement, aptitude,
interest, and personality. Following World War I, with the emergence
of group tests that largely used a multiple-choice format, there was
widespread optimism regarding the usefulness of psychological tests
(Kaplan & Saccuzzo, 2009). Samelson (1979, p. 154) points out that
Cattell remarked that during the war period ‘the army testing put
psychology on the map of the US’. Given that over a million people
were tested on the Army Alpha and Army Beta tests in the US,
Cattell’s observation has merit. Furthermore, the testing of pilots in
Italy and France, and the testing of truck drivers for the German army,
for example, suggest that the world wars did not only put psychology
on the map in the US but elsewhere in the world as well.

2.3.3 Measurement challenges


Although the period between the two world wars was a boom period
for the development of psychological measures, critics started pointing
out the weaknesses and limitations of existing measures. Although
this put test developers on the defensive, and dampened the
enthusiasm of assessment practitioners, the knowledge gained from
this critical look at testing inspired test developers to reach new
heights. To illustrate this point, let us consider two examples, one from
the field of intellectual assessment and the other from the field of
personality assessment.
The criticism of intelligence scales up to this point, i.e. that they
were too dependent on language and verbal skills, reduced their
appropriateness for many individuals (e.g. for illiterates). To address
this weakness, Wechsler included performance tests that did not
require verbal responses when he published the first version of the
Wechsler Intelligence Scales in 1937. Furthermore, whereas previous
intelligence scales only yielded one score (namely, the intelligence
quotient), the Wechsler Intelligence Scales yielded a variety of
summative scores from which a more detailed analysis of an
individual’s pattern of performance could be made. These innovations
revolutionised intelligence assessment.
The use of structured personality measures was severely criticised
during the 1930s as many findings of personality tests could not be
substantiated during scientific studies. However, the development of
the Minnesota Multiphasic Personality Inventory (MMPI) by Butcher in
1943 began a new era for structured, objective personality measures.
The MMPI placed an emphasis on using empirical data to determine
the meaning of test results. According to Kaplan and Saccuzzo
(2009), the MMPI and its revision, the MMPI-2, are the most widely
used and referenced personality tests to this day.
World War II reaffirmed the value of psychological assessment.
The 1940s witnessed the emergence of new test-development
technologies, such as the use of factor analysis to construct tests such
as the 16 Personality Factor Questionnaire.
During this period there was also much growth in the application of
psychology and psychological testing. Psychological testing came to
be seen as one of the major functions of psychologists working in
applied settings. In 1954 the American Psychological Association
(APA) pronounced that psychological testing was exclusively the
domain of the clinical psychologist. However, the APA unfortunately
also pronounced that psychologists were permitted to conduct
psychotherapy only in collaboration with medical practitioners.
As you can imagine, many clinical psychologists became
disillusioned by the fact that they could not practise psychotherapy
independently, and, although they had an important testing role to
fulfil, they began to feel that they were merely technicians who were
playing a subservient role to medical practitioners. Consequently,
when they looked around for something to blame for their poor
position, the most obvious scapegoat was psychological testing
(Lewandowski & Saccuzzo, 1976). At the same time, given the
intrusive nature of tests and the potential to abuse testing, widespread
mistrust and suspicion of tests and testing came to the fore. So, with
both psychologists and the public becoming rapidly disillusioned with
tests, many psychologists refused to use any tests, and countries
such as the US, Sweden, and Denmark banned the use of tests for
selection purposes in industry. So it is not surprising that according to
Kaplan and Saccuzzo (2009), the status of psychological assessment
declined sharply from the late 1950s, and this decline persisted until
the 1970s, when the use and usefulness of tests gradually began to
increase.
Why? The standardised nature of tests were seen to provide a
common basis for making clinical diagnoses, and the economies of
scale that tests provided resulted in an increase in the use of tests for
selection and placement in industry. Poortinga and Klieme (2016, p.
17) argue that this period in which test use was sharply criticised
declined and then increased, leading to the following understanding:

Tests may be fallible, but it was appreciated that through


standardisation and psychometric analysis of validity, the
weaknesses (and strengths) of tests can be made explicit,
while other methods of assessment and diagnosis may lead to
even poorer judgements but are not open to public scrutiny.

This statement remains a good summary of the status of


psychological testing versus other assessment approaches in the
second decade of the twenty-first century.
2.3.4 The influence of technology
In the 1950s and 1960s, the technical innovation of the high-speed
scanner to score test protocols (viewed as the first use of computer
technology in testing) increased the use of interest questionnaires
(e.g. the Strong Vocational Interest Blanks) and personality tests (e.g.
the Minnesota Multi-phasic Personality Inventory, the MMPI), in
particular. This invention, coupled with greater use of multiple-choice
item formats, led to the increased efficiency of, and a reduction in, the
cost of testing (Clarke, Madaus, Horn & Ramos, 2000).
According to Joubert and Kriek (2009) the ‘use of computer-based
testing has been reported in the literature since 1963 ’ (p. 78). In 1962
the first computer-based test interpretation (CBTI) system was
developed at the Mayo Clinic for the MMPI (Swenson, Rome,
Pearson, and Brannick, 1965). This was followed by the development
of a system for the computer-based interpretation of the Rorschach
Inkblot Test in 1964 (Gregory, 2007). The 1970s witnessed the first
automation of the entire assessment process at a psychiatric hospital
in Utah and the introduction of computerised adaptive testing (CAT)
(see Chapter 14, Section 14.2).
Until the 1980s, the role of the computer in testing was restricted
mainly to recording answers and computing test scores (Murphy &
Davidshofer, 2005). The first original applications of computerised
testing involved the computerisation of the items of paper-and-pencil
tests, for both administration and scoring purposes (Bunderson,
Inouye & Olsen, 1989; Linn, 1989). This continued until the early
1980s, when the emergence of advanced computer technology, and
the possibilities this presented for the design, administration, and
scoring of tests, had a large impact on all aspects of the testing
industry (Clarke, Madaus, Horn & Ramos, 2000).
The advent of computer-based tests meant that computers
became an integral part of the process of testing (administration,
scoring, and reporting), rather than simply providing the means of
scoring the tests as they did initially. Computer-based testing differs
from the conventional paper-based testing primarily in that test-takers
answer the questions using a computer rather than a pencil and paper
(Clarke, et al 2000). Computer technology also made it possible to
develop different types of items. For example, in 1994 the first
multimedia assessment batteries were developed to assess real-life
problem-solving skills in prospective employees (Gregory, 2010).
During the 1990s and the twenty-first century, computer-based
testing and its delivery over the Internet have brought more flexibility
to the testing arena (testing can happen anytime, and anywhere). The
efficiency of test delivery has also been increased, and the test-
development process has been enhanced. Expert item writers from all
over the world can be contracted to develop items (i.e. the questions,
tasks, and activities that require a response in a test), and can be
linked to each other electronically. This not only facilitates the item-
development process but also the item-refinement process. The
process of assessment generation can now be done through item-
generation engines, which encapsulate a model of the intended
assessment, and take the form of software that writes the test at the
touch of a key. The use of computers has also helped test developers
to re-envision what test items look like, and how they are scored and
reported (Zenisky & Sireci, 2002).
Stark, Martin and Chernyshenko (2016, p. 395−398) assert that
technology has impacted on psychological and educational testing in
five main ways. These are summarised in the table below, along with
some challenges experienced.
Table 2.1 The impact of technology on psychological and educational testing

IMPACT OF TECHNOLOGY ON CHALLENGE(S)


TESTING

Assessment measures are more Need to improve online


accessible, given that tests can be proctoring, verifying identity of
delivered to computing devices test-taker, and item and data
and mobile phones via the security. Also, using different
Internet. computing devices poses
challenges to test standardisation.
Passive assessment of a person There is little, if any, validity
applying for a job or academic evidence that passive assessment
studies can be conducted by of personal characteristics
exploring and mining social- deduced from social media are an
media sites, which can provide accurate sample and reflection of a
insights into personality person’s behaviour.
functioning, values, attitudes, etc.
but also raises ethical dilemmas
(e.g. Park, Schwartz, Eichstaedt,
Kern, Kosinski, Stillwell, Ungar &
Seligman, 2015).

Increased efficiency in that The need to ensure that


screening can be conducted automated scoring programmes
online, allowing only those who are sensitive to cultural and
meet the criteria, or who require linguistic diversity. For example, if
more in-depth assessment, to be an essay is written by a second-
identified (e.g. Mead, Olson- language English speaker, does
Buchanan and Drasgow, 2013). the automated rating system make
Automated scoring and e-rater allowance for this?
systems also increase the speed,
and reduce the cost, of scoring
test answers, especially
constructed response tasks (e.g.
essays).

Psychometric methodologies have Researchers and practitioners need


been advanced, which has been of to be skilled in these new
great benefit in large-scale testing psychometric methodologies.
programmes. For instance,
advances in Item Response Theory
(IRT) have improved test
construction and scoring, as well
as item generation, allowing item
exposure and compromise to be
controlled, and security breaches
to be detected.

New item types have emerged as The need to ensure that the focus
has the use of simulations and remains on the appropriateness of
gaming in assessment (see these new item types for the
Chapter 14 for a fuller discussion). intended assessment purpose and
the intended test-takers, rather
than on simply wanting to
advance the application of new
technologies in assessment.

Despite the advances that computer-based and Internet-delivered


testing has brought to the field of testing and assessment, it has also
brought many legal and ethical challenges. Consequently, various
good practice guidelines have been published, with the International
Test Commission Guidelines on Computer-Based and Internet-
Delivered Testing (International Test Commission, 2006), and A Test-
taker’s Guide to Technology-based Testing (International Test
Commission, 2010) being the most recent.

2.3.5 The influence of multiculturalism


In the latter part of the twentieth century and during the first two
decades of the twenty-first century, multiculturalism started becoming
the norm in many countries. As a result, attempts were made to
develop tests that were ‘culture-free’. An example of such a measure
is the Culture-free Intelligence Test (Anastasi & Urbina, 1997).
However, it soon became clear that it was not possible to develop a
test free of any cultural influence. Consequently, test developers
focused more on ‘culture-reduced’ or ‘culture-common’ tests in
which the aim was to remove as much cultural bias as possible from
the test by including only behaviour that was common across cultures.
For example, a number of non-verbal intelligence tests were
developed (e.g. Test of Non-verbal Intelligence, Raven’s Progressive
Matrices) where the focus was on novel problem-solving tasks and in
which language use, which is often a stumbling block in cross-cultural
tests, was minimised.
Furthermore, given that most of the available measures have been
developed in the US or the United Kingdom (UK), they tend to be
more appropriate for Westernised English-speaking people. In
response to the rapid globalisation of the world’s population and the
need for measures to be more culturally appropriate and available in
the language in which the test-taker is proficient in, the focus of
psychological testing in the 1980s and 1990s shifted to cross-cultural
test adaptation. Under the leadership of Ron Hambleton from the US,
the International Test Commission (ITC) developed Guidelines for
Adapting and Translating Tests (International Test Commission,
2016), which have become the benchmark for cross-cultural test
translation and adaptation around the world. They have also assisted
in advocating against assessment practices where test-takers are
tested in languages in which they are not proficient, sometimes using
a translator who translates the test ‘on the run’. In addition, many
methodologies and statistical techniques (e.g. Structural Equation
Modelling) have been developed to establish whether different
language versions of a test are equivalent (Hambleton, Merenda &
Spielberger, 2005). Sparked by research stemming from large-scale
international comparative tests such as the Trends in International
Mathematics and Science Study (TIMSS) and the Progressive
International Reading Literacy Study (PIRLS), the second decade of
the twenty-first century has seen renewed interest in issues of bias
and fairness when testing in linguistically diverse contexts.
Consequently, the ITC has decided to develop guidelines for testing
language minorities. You can read more about how and why
measures are adapted for use in different countries and cultures in
Chapter 6 and about language issues in assessment in Chapters 7
and 9.
A new trend that is emerging in the twenty-first century is to
approach the development of tests that are used widely internationally
(e.g. the Wechsler Intelligence Scales) from a multicultural
perspective. For example, when it comes to the Wechsler Intelligence
Scales for Children (WISC), the norm through the years has been to
first develop and standardise the measure for the US, and thereafter
to adapt it for use outside the US. However, for the development of
the Wechsler Intelligence Scale for Children – Fourth Edition (WISC-
IV), experts from various countries provided input on the constructs to
be tapped as well as the content of the items to minimise potential
cultural bias during the initial redesign phase (Weiss, 2003). In the
process, the development of the WISC-IV set a new benchmark for
the development of internationally applicable tests.
A further trend in multicultural and multilingual test development is
that of simultaneous multilingual test development (Solano-Flores,
Turnbull & Nelson-Barber, 2002; Tanzer, 2005). This differs from the
process outlined for the development of the WISC-IV, and which
continues to be used in test development currently, where people from
different cultural and language groups provide input on the
construct(s) to be tapped but the items are still developed in English
before they are translated into other languages. Instead, in
simultaneous multilingual test development, once the test
specifications have been developed, items are written by a multilingual
and multicultural panel or committee where each member has a
background in psychology (general and cross-cultural in particular),
measurement and linguistics as well as with respect to the specific
construct that the test will measure (e.g. personality, mathematics).
Chapter 7 will provide more information on this approach.
Non-Western countries are also rising to the challenge of not only
adapting Westernised measures for their contexts (e.g. Gintiliene and
Girdzijauskiene, 2008, report that, among others, the WISC-III and the
Raven’s Coloured Matrices have been adapted for use in Lithuania),
but also to develop their own indigenous measures, which are more
suited to their cultural contexts. For example, Cheung and her
colleagues have developed the Chinese Personality Inventory (CPAI),
which was revised in 2000 and is now known as the CPAI-2 (Cheung,
Leung, Fan, Song, Zhang, & Zhang, 1996). This measure includes
both indigenous (culturally relevant) and universal personality
dimensions. Indigenous personality constructs were derived from
classical literature, everyday descriptions of people, surveys, and
previous psychological research. Thereafter items and scales were
developed according to the highest acceptable psychometric
standards. You can get information on the CPAI and its development
by consulting Cheung and Cheung (2003). The way in which the CPAI
was developed is widely regarded as the benchmark to attain in the
development of culturally relevant personality measures. How this
approach is being successfully used in the South African context will
be covered in Chapter 7.
Other than multiculturalism impacting on the nature of how tests
are developed and adapted, due to rapid globalisation, which has led
to increasing multiculturalism in most societies, the choice of which
norm group to use to compare an individual’s performance to has also
become an issue. As will be outlined in Chapter 3, norms provide a
basis for comparing an individual’s performance on a measure with
the performance of a well-defined reference group to aid in the
interpretation of test performance. Norm or reference groups can be
constituted in terms of various characteristics of people (e.g. age,
gender, educational level, job level, language, clinical diagnosis). The
key criterion when choosing an appropriate norm group to compare an
individual’s performance to is linked to the purpose of the comparison.
For example, if the intention is to compare the performance with
others in the same cultural group of a similar age, culturally and age-
appropriate norms must be used. However, what norms does a
multinational organisation use to make comparisons of its workforce
across cultural and national boundaries? Bartram (2008a) argues that
using locally developed national norms are not appropriate in this
instance. Instead, he argues that multinational norms should be
developed and used, provided that the mix of country samples is
reasonable and that the samples have similar demographics.

2.3.6 Standards, training, and test users’ roles


In an attempt to address issues of fairness and bias in test use, the
need arose to develop standards for the professional practice of
testing and assessment. Under the leadership of Bartram from the
UK, the ITC developed a set of International Guidelines for Test-use
(Version 2000). Many countries, including South Africa, have adopted
these test-user standards, which should ensure that, wherever testing
and assessment is undertaken in the world, similar practice standards
should be evident. You can read more about this in Chapter 8.
During the 1990s, competency-based training of assessment
practitioners (test users) fell under the spotlight. The British
Psychological Society (BPS) took the lead internationally in
developing competency standards for different levels of test users in
occupational testing in the UK. Based on these competencies,
competency-based training programmes have been developed, and
all test users have to be assessed by BPS-appointed assessors in
order to ensure that a uniform standard is maintained. (You can obtain
more information from the Web site of the Psychological Testing
Centre of the BPS at www.psychtesting.org.uk). Building on the work
done in the UK, Sweden, Norway, and the Netherlands, the Standing
Committee of Tests and Testing (which has subsequently been
renamed the Board of Assessment) of the European Federation of
Psychology Associations (EFPA) developed standards of competence
related to test use and a qualification model for Europe as a whole. A
Test-user Accreditation Committee has been established to facilitate
the recognition of qualifications that meet or can be developed to meet
such standards (visit www.efpa.be for more information).
With assessment being widely used in recruitment, selection, and
training in work and organisational settings, a standard was
‘developed by ISO (the International Organisation for Standardisation)
to ensure that assessment procedures and methods are used properly
and ethically’ (Bartram, 2008b, p. 9) and that those performing the
assessment are suitably qualified. Published in 2010, the ISO
Standard for Assessment in Organisational Settings (PC 230) now
provides a benchmark for providers of assessment services to
demonstrate that they have the necessary skills and expertise to
provide assessments that are ‘fit for purpose’. In addition, the standard
can be used by individual organisations to certify internal quality
assurance processes, or by those contracting in assessment services
to ascertain the minimum requirements that need to be met by the
organisation providing the services, or by professional bodies for
credentialling purposes.
Other than focusing on the competence of assessment
practitioners, there has been increased emphasis on test quality in
the last two decades. Evers (2012) argues that such an emphasis is
not surprising given the ‘importance of psychological tests for the work
of psychologists, the impact tests may have on their clients, and the
emphasis on quality issues in current society’ (p. 137). Many countries
(e.g. the UK, Netherlands, Russia, Brazil, and South Africa) have test
review systems – a mechanism used to set standards for and
evaluate test quality. Chapter 8 will provide more detail on test review
systems.
Linked to, but not restricted to computer-based and Internet-
delivered testing, a trend has emerged around who may use
psychological measures and what their qualifications should be.
Among the more important issues is confusion regarding the roles and
responsibilities of people involved in the assessment process and
what knowledge, qualifications, and expertise they require, which has
been fuelled by the distinction that is drawn between competency-
based and psychological assessment (Bartram, 2003). The block
below presents a fairly simplistic description of these two types of
assessment to illustrate the distinction between them.

Psychological assessment requires expertise in psychology and


psychological theories to ensure that measures of cognitive, aptitude,
and person-ality functioning are used in an ethical and fair manner, right
from the choice of which tests to use through to interpretation and
feedback. Furthermore, the outputs of psychological assessment are in
the form of psychological traits/constructs (such as personality and
ability). The expertise to perform psychological assessment is clearly
embodied in an appropriately registered psychology professional.
Competency-based assessment focuses on the skills, behaviours,
knowledge, and attitudes/ values required for effective performance in the
workplace or in educational settings (e.g. communication, problem-solving, task
orientation). The assessment measures used are as directly linked as possible to
the required competencies. Indirect methods such as simulations and
assessment centres are used to conduct competency-based assessment. As
the outputs of such assessment are directly linked to the language of the
workplace or educational settings, the test user does not need expertise in
psychological theories to be able to apply the results of competency-based
assessments. What is required, however, is that competency-based
assessment needs to be performed by people with expertise in this area of
assessment (e.g. skilled in job analysis and competency-based interviews).

Bartram (2003) argues that there is reasonable international


consensus regarding the distinction between psychological and
competency-based assessment. Consequently, some countries (e.g.
Finland) have used this to legally distinguish between assessments
that only psychologists can do (psychological assessment) and those
that non-psychologists can do (competency-based assessment). The
advent of computer-based and Internet-delivered assessment has
further meant that the test user in terms of competency-based
assessment is not involved in test selection, administration, scoring,
and interpretation. Instead, the test user receives the output of the
assessment in a language congruent with the competencies required
in the workplace. In view of the changing nature of the definition of a
test user, it is important that the roles, responsibilities, and required
training of all those involved in the assessment process is clarified to
ensure that the test-taker is assessed fairly and appropriately.
To draw this brief international review of the history of
psychological assessment to a close, one could ask what the status of
psychological testing and assessment is nearly two decades into the
twenty-first century? Roe (2008) asserts that the field of
psychometrics and psychological testing is doing well as it continues
to innovatively respond to issues (e.g. globalisation, and increased
use of the Internet); develop new test-development methodologies;
advance test theory, data analysis techniques, and measurement
models (Alonso-Arbiol & Van de Vijver, 2010); and to be sensitive to
the rights and reactions of test-takers. Among the challenges that Roe
(2008) identifies are the fact that test scores are not easily understood
by the public, that psychology professionals struggle to communicate
with policymakers, and that test security and cheating on tests are
growing issues. He also argues that for testing to make a better
contribution to society, psychology professionals need to
conceptualise the fact that testing and assessment (from test selection
to reporting) are part of the services that they offer to clients, and that
the added value of tests should not be confused with the added value
of test-based services.
Poortinga and Klieme (2016, p. 24) are concerned that insufficient
attention is given to the consequences of the application of test
results, which may at times have an ‘adverse impact’. Consequently,
they call for a more ‘societally oriented approach to testing … in which
stakeholders of all target populations are represented, including
representatives of client groups, such as trade unions, and ethnic and
migrant minorities’. Poortinga and Klieme (2016) further argue that
solutions need to be found to address the difficulties experienced in
testing culturally and linguistically diverse test-takers. They suggest
that the solution lies in having ‘closely collaborating teams … of
representatives of the various populations to which the instruments
will be administered’, developing tests where ‘emic [i.e. context-
specific] variations will be added to a common core’ (p. 25).

The development of modern


2.4
psychological assessment: A
South African perspective
2.4.1 The early years
In Africa, the earliest record of the use of psychological tests dates
back to 1915 in South Africa (Laher & Cockcroft 2014), and 1917
elsewhere in Sub-Saharan Africa (Poortinga & Klieme, 2016). Both
elsewhere in Africa and in South Africa, psychological tests were
initially largely used for researching individual differences in
intellectual and cognitive functioning.
When considering the historical development of modern
psychological measures in South Africa, it is important to note the
context in which this development took place. Psychological
assessment in South Africa developed in an environment
characterised by the unequal distribution of resources based on racial
categories (black, coloured, Indian, and white). Almost inevitably, the
development of psychological assessment reflected the racially
segregated society in which it evolved. So it is not surprising that
Claassen (1997) asserts that ‘Testing in South Africa cannot be
divorced from the country’s political, economic and social history’ (p.
297). Indeed, any account of the history of psychological assessment
in South Africa needs to point out the substantial impact that apartheid
policies had on test development and use (Nzimande, 1995).
Even before the Nationalist Party came into power in 1948, the
earliest psychological measures were standardised only for whites
and were used by the Education Department to place white pupils in
special education. The early measures ranged from those transported
here (e.g. the Porteus Maze Test), to adaptations of international
measures such as the Stanford-Binet, the South African revision of
which became known as the Fick Scale, to tests that were developed
specifically for use here, such as the South African Group Test
(Wilcocks, 1931).
Not only were the early measures only standardised for whites,
but, driven by political ideologies, measures of intellectual ability were
used in research studies to draw distinctions between races in an
attempt to show the superiority of one group over another. For
example, during the 1930s and 1940s, when the government was
grappling with the issue of establishing ‘Bantu education’, Fick (1929)
administered individual measures of motor and reasoning abilities,
which had only been standardised for white children, to a large sample
of black, coloured, Indian, and white school children. He found that the
mean score of black children was inferior to that of Indian and
coloured children, with whites’ mean scores superior to all groups. He
remarked at the time that factors such as inferior schools and teaching
methods, along with black children’s unfamiliarity with the nature of
the test tasks, could have disadvantaged their performance on the
measures. However, when he extended his research in 1939, he
attributed the inferior performance of black children in comparison to
that of white children to innate differences, or, in his words, ‘difference
in original ability’ (p. 53) between blacks and whites.
Fick’s conclusions were strongly challenged and disputed. For
example, Fick’s work was severely criticised by Biesheuvel in his book
African Intelligence (1943), in which an entire chapter was devoted to
this issue. Biesheuvel queried the cultural appropriateness of
Western-type intelligence tests for blacks and highlighted the influence
of different cultural, environmental, and temperamental factors and the
effects of malnutrition on intelligence. This led him to conclude that
‘under present circumstances, and by means of the usual techniques,
the difference between the intellectual capacity of Africans and
Europeans cannot be scientifically determined’ (Biesheuvel, 1943, p.
91).
In the early development and use of psychological measures in
South Africa, some important trends can be identified, which were set
to continue into the next and subsequent eras of psychological
assessment in South Africa. Any idea what these were?

The trends were:


• the focus on standardising measures for whites only
• the misuse of measures by administering measures standardised
for one group to another group without investigating whether or not
the measures might be biased and inappropriate for the other group
• the misuse of test results to reach conclusions about differences
between groups without considering the impact of inter alia cultural,
socio-economic, environmental, and educational factors on test
performance.

2.4.2 The early use of assessment measures in industry


The use of psychological measures in industry gained momentum
after World War II and after 1948 when the Nationalist Government
came into power. As was the case internationally, psychological
measures were developed in response to a societal need. According
to Claassen (1997), after World War II there was an urgent need to
identify the occupational suitability (especially for work on the mines)
of large numbers of blacks who had received very little formal
education. Among the better measures constructed was the General
Adaptability Battery (GAB) (Biesheuvel, 1949, 1952), which included a
practice session during which test-takers were familiarised with the
concepts required to solve the test problems and were asked to
complete some practice examples. The GAB was predominantly used
for a preliterate black population, speaking a number of dialects and
languages.
Because of job reservation under the apartheid regime and better
formal education opportunities, as education was segregated along
racial lines, whites competed for different categories of work to blacks.
The Otis Mental Ability Test, which was developed in the US and only
had American norms, was often used when assessing whites in
industry (Claassen, 1997).

Among the important trends in this era that would continue into
subsequent eras were:
• the development of measures in response to a need that
existed within a certain political dispensation
• the notion that people who are unfamiliar with the concepts in a
measure should be familiarised with them before they are assessed
• the use of overseas measures and their norms without investigating
whether they should be adapted/revised for use in South Africa.

2.4.3 The development of psychological assessment


from the 1960s onwards
According to Claassen (1997), a large number of psychological
measures were developed in the period between 1960 and 1984. The
National Institute for Personnel Research (NIPR) concentrated on
developing measures for industry and pioneered computerised testing
in South Africa from the late 1970s onwards (see Chapter 13). The
Institute for Psychological and Edumetric Research (IPER) developed
measures for education and clinical practice. Both of these institutions
were later incorporated into the Human Sciences Research Council
(HSRC).
In the racially segregated South African society of the apartheid
era, it was almost inevitable that psychological measures would be
developed along cultural/racial lines as there ‘was little specific need
for common tests because the various groups did not compete with
each other’ (Owen, 1991, p. 112). Consequently, prior to the early
1980s, Western models were used to develop similar but separate
measures for the various racial and language groups (Owen, 1991).
Furthermore, although a reasonable number of measures were
developed for whites, considerably fewer measures were developed
for blacks, coloureds, and Indians. While the tests developed in this
period included South African content, little attention was given to
designing test tasks that were African-centred and which would be
more meaningful for the diverse South African people to engage with.
One of the exceptions was the work done by Reuning and Wortley
(1973) to map the cognitive functioning of the San people. Years
before, Porteus (1937) administered paper-and-pencil mazes to San
test-takers and concluded that they had low levels of intelligence. In
Reuning and Worley’s (1973) study, teams of South African
psychologists administered a wide range of perception and cognitive
measures to a sample of San in the Kalahari Desert. Among the
criteria used to adapt or develop the measures were that the verbal
instructions should be minimal and the test-taker should be able to
easily understand what response was expected, and responses
should be in the form of actions. Instead of using unfamiliar pencil-
and-paper tasks, a version of the Porteus Maze Test was developed
in which a three-dimensional maze was presented on a wooden
board. The maze was solved by moving a disc from a hole in the
centre to the exit of the maze by tilting the board in various directions.
Reuning and Wortley (1973) found that the San were ‘smart’, which
did not support Porteus’s (1937) findings that the San had subnormal
intelligence. This study is a good example of the value of using
African-centred tasks to obtain more appropriate and valid results. It is
a pity that these early examples of the value of adopting an African-
centred approach to developing tests and test tasks did not become
the norm.
During the 1980s and early 1990s, once the sociopolitical situation
began to change and discriminatory laws were repealed, starting with
the relaxation of ‘petty apartheid’, applicants from different racial
groups began competing for the same jobs and the use of separate
measures in such instances came under close scrutiny.

A number of questions were raised, such as:


• How can you compare scores if different measures are used?
• How do you appoint people if different measures are used?

In an attempt to address this problem, two approaches were followed.


In the first instance, measures were developed for more than one
racial group, and/or norms were constructed for more than one racial
group so that test performance could be interpreted in relation to an
appropriate norm group. Examples of such measures are the General
Scholastic Aptitude Test (GSAT), the Ability, Processing of
Information, and Learning Battery (APIL-B), and the Paper and Pencil
Games (PPG), which was the first measure to be available in all 11
official languages in South Africa. In the second instance,
psychological measures developed and standardised on only white
South Africans, as well as those imported from overseas, were used
to assess other groups as well. In the absence of appropriate norms,
the potentially bad habit arose of interpreting such test results ‘with
caution’.

Why was this a bad habit?

It eased assessment practitioners’ consciences and lulled them into a


sense that they were doing the best they could with the few tools at
their disposal. You can read more about this issue in Chapter 8. The
major problem with this approach was that, initially, little research was
done to determine the suitability of these measures for a multicultural
South African environment. Research studies that investigated the
performance of different groups on these measures were needed to
determine whether or not the measures were biased.
Despite the widespread use of psychological measures in South
Africa, the first thorough study of bias took place only in 1986. This
was when Owen (1986) investigated test and item bias using the
Senior Aptitude Test, Mechanical Insight Test, and the Scholastic
Proficiency Test on black, white, coloured, and Indian subjects. He
found major differences between the test scores of blacks and whites
and concluded that understanding and reducing the differential
performance between black and white South Africans would be a
major challenge. Research by Abrahams (1996), Owen (1989a,
1989b), Retief (1992), Taylor and Boeyens (1991), and Taylor and
Radford (1986) showed that bias existed in other South African ability
and personality measures as well. Other than empirical investigations
into test bias, Taylor (1987) also published a report on the
responsibilities of assessment practitioners and publishers with regard
to bias and fairness of measures. Furthermore, Owen (1992a, 1992b)
pointed out that comparable test performance could only be achieved
between different groups in South Africa if environmentally
disadvantaged test-takers were provided with sufficient training in
taking a particular measure before they actually took it.
Given the widespread use (and misuse) of potentially culturally
biased measures, coupled with a growing perception that measures
were a means by which the Nationalist Government could exclude
black South Africans from occupational and educational opportunities,
what do you think happened? A negative perception regarding the
usefulness of psychological measures developed and large sections
of the South African population began to reject the use of
psychological measures altogether (Claassen, 1997; Foxcroft, 1997b).
Issues related to the usefulness of measures will be explored further
in the last section of this chapter.
Remember that in the US testing came to be seen as one of the
most important functions of psychologists, and only psychologists.
During the 1970s important legislation was tabled in South Africa that
restricted the use of psychological assessment measures to
psychologists only. The Health Professions Act (No. 56 of 1974)
defines the use of psychological measures as constituting a
psychological act, which can legally only be performed by
psychologists, or certain other groups of professions under certain
circumstances. The section of the act dealing with assessment will be
explored further in Chapter 8.

Among the important trends in this era were:


• the impact of the apartheid political dispensation on the
development and fair use of measures
• the need to empirically investigate test bias
• growing scepticism regarding the value of psychological
measures, especially for black South Africans.

2.4.4 Psychological assessment in democratic South Africa


2.4.4.1 Assessment in education
Since 1994 and the election of South Africa’s first democratic
government, the application, control, and development of assessment
measures have become contested terrain. With a growing resistance
to assessment measures and the ruling African National Congress’
(ANC) express purpose to focus on issues of equity to redress past
imbalances, the use of tests in industry and education, in particular,
has been placed under the spotlight.
School-readiness testing, as well as the routine administration of
group tests in schools, was banned in many provinces as such testing
was seen as being exclusionary and perpetuating the discriminatory
policies of the past. Furthermore, the usefulness of test results and
assessment practices in educational settings has been strongly
queried in the Education White Paper 6, Special Needs Education:
Building an Inclusive Education and Training System (Department of
Education, 2001), for example. Psychometric test results should
contribute to the identification of learning problems and educational
programme planning as well as informing the instruction of learners.
Why is this not happening? Could it be that the measures used are not
cross-culturally applicable or have not been adapted for our diverse
population, and thus do not provide valid and reliable results? Maybe
it is because the measures used are not sufficiently aligned with the
learning outcomes of Curriculum 21? Or could it be that psychological
assessment reports are filled with jargon and recommendations that
are not always translated into practical suggestions on what the
educator can do in class to support and develop the learner?
Furthermore, within an inclusionary educational system, the role of
the educational psychologist along with that of other professional
support staff is rapidly changing. Multidisciplinary professional district-
based support teams need to be created and have been established
in some provinces (e.g. the Western Cape) (Department of Education,
2005). Within these teams, the primary focus of the psychologists and
other professionals is to provide indirect support (‘consultancy’) to
learners through supporting educators and the school management
(e.g. to identify learner needs, and the teaching and learning
strategies that can respond to these needs, to conduct research to
map out needs of learners and educators, and to establish the efficacy
of a programme). However, Pillay (2011) notes that psychologists
often find it challenging to work in collaborative teams in educational
contexts, which could be linked to the fact that they need in-service
training to equip them in this regard. A further way that psychologists
can support educators in identifying barriers to learning in their
learners is to train them in the use of educational and psychological
screening measures (see Chapter 15 for a discussion on screening
versus diagnostic assessment).
The secondary focus of the professional support teams is to
provide direct learning support to learners (e.g. diagnostic assessment
to describe and understand a learner’s specific learning difficulties,
and to develop an intervention plan). As is the case with the changing
role of psychologists in assessment due to the advent of competency-
based, computer-based and Internet-delivered testing, the policies on
inclusive education and the strategies for its implementation in South
Africa are changing the role that educational psychologists play in
terms of assessment and intervention in the school system. This can
be viewed as a positive challenge as educational psychologists have
to develop new competencies (e.g. to train and mentor educators in
screening assessment; to apply measurement and evaluation
principles in a variety of school contexts). In addition, psychologists
will have to ensure that their specialist assessment role is not eroded.
However, instead of the specialist assessment being too diagnostically
focused, it also needs to be developmentally focused so that
assessment results can inter alia be linked to development
opportunities that the educator can provide (see Chapter 15).
2.4.4.2 The Employment Equity Act
To date, the strongest stance against the improper use of assessment
measures has come from industry. Historically, individuals were not
legally protected against any form of discrimination. However, with the
adoption of the new Constitution and the Labour Relations Act in
1996, worker unions and individuals now have the support of
legislation that specifically forbids any discriminatory practices in the
workplace and includes protection for applicants as they have all the
rights of current employees in this regard. To ensure that
discrimination is addressed within the testing arena, the Employment
Equity Act (No. 55 of 1998, Section 8) refers to psychological tests
and assessment specifically and states:

Psychological testing and other similar forms or assessments of an


employee are prohibited unless the test or assessment being used:
(a) has been scientifically shown to be valid and reliable
(b) can be applied fairly to all employees
(c) is not biased against any employee or group.

The Employment Equity Act (EEA) has since been amended. The
following was added to the Employment Equity Amendment Act, Act
47 of 2013: ‘and (d) has been certified by the Health Professions
Council of South Africa or any other body which may be authorised by
law to certify those tests or assessments’ (Republic of South Africa,
2013, section 4).
The EEA and the Employment Equity Amendment Act have major
implications for assessment practitioners in South Africa because
many of the measures currently in use, whether imported from the US
and Europe or developed locally, have not been investigated for bias
and have not been cross-culturally validated here (as was discussed
in Section 2.3.5). The impact of this act and its amendment on the
conceptualisation and professional practice of assessment in South
Africa in general is far-reaching as assessment practitioners and test
publishers are increasingly being called upon to demonstrate, or prove
in court, that a particular assessment measure does not discriminate
against certain groups of people. It is thus not surprising that there has
been a notable increase in the number of test bias studies since the
promulgation of the EEA in 1998 (e.g. Abrahams & Mauer, 1999a,
1999b; Lopes, Roodt & Mauer, 2001; Meiring, Van de Vijver,
Rothmann & Barrick, 2005; Meiring, Van de Vijver, Rothmann &
Sackett, 2006; Schaap, 2003, 2011; Schaap & Basson, 2003; Taylor,
2000; Van Zyl & Visser, 1998; Van Zyl & Taylor, 2012; Visser &
Viviers, 2010).
A further consequence of the EEA and its amendment is that there
is an emerging thought that it would be useful if test publishers and
distributors could certify a measure as being ‘Employment Equity Act
Compliant’ as this will aid assessment practitioners when selecting
measures (Lopes, Roodt & Mauer, 2001). While this sounds like a
very practical suggestion, such certification could be misleading. For
example, just because a measure is certified as being compliant does
not protect the results from being used in an unfair way when making
selection decisions. Furthermore, given the variety of cultural and
language groups in this country, bias investigations would have to be
conducted for all the subgroups on whom the measure is to be used
before it can be given the stamp of approval. Alternatively, it would
have to be clearly indicated for which subgroups the measure has
been found to be unbiased and that it is only for these groups that the
measure complies with the act.
The addition of clause (d) in the amendment of the EEA was
controversial. While the control and use of psychological tests falls
under the Professional Board for Psychology, the Association for Test
Publishers (ATP) instituted legal proceedings and argued that the
Board only has procedures in place to classify psychological tests, not
to issue certificates, as is required in section 8(d) in the amended act.
The court found in favour of the ATP and section 8(d), which required
that tests used in industry should be certified, was rescinded (ATP v.
the President of SA, the Minister of Labour and the Health Professions
Council of South Africa, 2017). Nonetheless, the onus is still on the
assessment practitioner to comply with 8(a), (b) and (c) − namely, that
the psychological tests used are valid and reliable, applied fairly, and
are not biased.
The advent of the Employment Equity Act has forced assessment
practitioners to take stock of the available measures in terms of their
quality, cross-cultural applicability, the appropriateness of their norms,
and the availability of different language versions. To this end, the
Human Sciences Research Council (HSRC) conducted a survey of
test use patterns and needs of practitioners in South Africa (Foxcroft,
Paterson, le Roux & Herbst, 2004). Among other things, it was found
that most of the tests being used frequently are in need of adapting for
our multicultural context or require updating, and appropriate norms
and various language versions should be provided. The report on the
survey, which can be viewed at www.hsrc.ac.za, provides an agenda
for how to tackle the improvement of the quality of measures and
assessment practices in South Africa as well as providing a list of the
most frequently used measures, which should be earmarked for
adaptation or revision first.
Having an agenda is important, but having organisations and
experts to develop and adapt tests is equally important. As was
discussed in Section 2.4.3, the HSRC, into which the NIPR and IPER
were incorporated, developed a large number of tests during the
1960s through to the 1980s. For at least three decades the HSRC
almost exclusively developed and distributed tests in South Africa.
However, at the start of the 1990s the HSRC was restructured,
became unsure about the role that it should play in psychological and
educational test development, and many staff with test-development
expertise left the organisation. Consequently, since the South African
adaptation of the Wechsler Adult Intelligence Scales-III (WAIS-III) in
the mid-1990s, the HSRC has not developed or adapted any other
tests and its former tests that are still in circulation are distributed by
Mindmuzik Media now. According to Foxcroft and Davies (2008), with
the HSRC relinquishing its role as the major test developer and
distributor in South Africa, other role players gradually emerged to fill
the void in South African test development. Some international test
development and distribution agencies such as SHL (www.shl.co.za)
and Psytech (www.psytech.co.za) have agencies in South Africa.
Furthermore, local test agencies such as Jopie van Rooyen &
Partners SA (www.jvrafrica.co.za) and Mindmuzik Media
(www.mindmuzik.com) have established themselves in the
marketplace. Each of these agencies has a research and
development section, and much emphasis is being placed on adapting
international measures for the South African context. However, there
is still some way to go before all the tests listed on the Web sites of
the agencies that operate in South Africa have been adapted for use
here and have South African norms.
The other encouraging trend is that there is a greater involvement
of universities (e.g. Unisa, Nelson Mandela, North-West, Rhodes,
Witwatersrand, Pretoria, Johannesburg, Stellenbosch, and KwaZulu-
Natal universities) in researching and adapting tests, developing local
norms and undertaking local psychometric studies, and even
developing indigenous tests (e.g. Taylor & De Bruin, 2003).
Furthermore, some organisations such as the South African Breweries
and the South African Police Services (SAPS) that undertake large-
scale testing have undertaken numerous studies to provide
psychometric information on the measures that they use, investigate
bias, and adapt measures on the basis of their findings (e.g. Meiring,
2007).
South African assessment practitioners and test developers have
thus not remained passive in the wake of legislation impacting on test
use. Although the future use of psychological assessment, particularly
in industry and education, still hangs in the balance at this stage, there
are encouraging signs that progress is being made to research and
adapt more measures appropriate for our context and to use them in
fair ways to the benefit of individuals and organisations. The way in
which psychologists and test developers continue to respond to this
challenge will largely shape the future destiny of psychological testing
here.
2.4.4.3 Professional practice guidelines
According to the Health Professions Act (No. 56 of 1974), the
Professional Board for Psychology of the Health Professions Council
of South Africa (HPCSA) is mandated to protect the public and to
guide the profession of psychology. In recent years, the Professional
Board has become increasingly concerned about the misuse of
assessment measures in this country, while recognising the important
role of psychological assessment in the professional practice of
psychology as well as for research purposes. Whereas the
Professional Board for Psychology had previously given the Test
Commission of the Republic of South Africa (TCRSA) the authority to
classify psychological tests and oversee the training and examination
of certain categories of assessment practitioners, these powers were
revoked by the Board in 1996. The reason for this was that the
TCRSA did not have any statutory power and, being a section 21
company that operated largely in Gauteng, its membership base was
not representative of psychologists throughout the country. Instead,
the Professional Board for Psychology formed the Psychometrics
Committee, which, as a formal committee of the Board, has provided
the Board with a more direct way of controlling and regulating
psychological test use in South Africa. The role of the Psychometrics
Committee of the Professional Board and some of the initiatives that it
has launched will be discussed more fully in Chapter 8.
The further introduction of regulations and the policing of
assessment practitioners will not stamp out test abuse. Rather,
individual assessment practitioners need to make the fair and ethical
use of tests a norm for themselves. Consequently, the Psychometrics
Committee has actively participated in the development of
internationally acceptable standards for test use in conjunction with
the International Test Commission’s (ITC) test-use project (see
Section 2.3 and Chapter 8) and has developed certain competency-
based training guidelines (see www.hpcsa.co.za). Furthermore, the
various categories of psychology professionals who use tests have to
write a national examination under the auspices of the Professional
Board before they are allowed to practise professionally. Such a
national examination helps to ensure that professionals enter the field
with at least the same minimum discernible competencies. In Chapter
8, the different categories of assessment practitioners in South Africa
will be discussed, as will their scope of practice and the nature of their
training.
To complicate matters, the introduction of computer-based and
Internet-delivered testing has led to a reconceptualisation of the
psychology professional’s role, especially in terms of test
administration. There are those who argue that trained test
administrators (non-psychologists) can oversee structured, computer-
based testing and that the test classification system in South African
should make allowance for this. Indeed, in the landmark court case
that hinged on the definition of what test use means, the honourable
Judge Bertelsmann concluded that the Professional Board needs to
publish a list of tests that are restricted for use by psychologists as it
‘is unlikely that the primarily mechanical function of the recording of
test results should be reserved for psychologists’ (Association of Test
Publishers of South Africa; and Another, Saville and Holdsworth South
Africa v. The Chairperson of the Professional Board of Psychology,
2010, p. 20). Whether it is correct to conclude that test administration
is purely mechanistic will be critically examined in Chapter 9.

There is also a growing recognition among South African


psychologists that many of the measures used in industry and the
educational sector do not fall under the label of ‘psychological tests’.
But the general public does not differentiate between whether or not a
test is a psychological test. Consequently, there is a need to set
general standards for testing and test use in general in South Africa.
This, together with the employment equity and labour relations
legislation in industry, has led to repeated calls to define uniform test-
user standards across all types of tests and assessment settings and
for a central body to enforce them (Foxcroft, Paterson, le Roux and
Herbst, 2004). To date, such calls have fallen on deaf ears.

As you look back over this section, you should become aware that psychological
assessment in South Africa has been and is currently being shaped by:
• legislation and the political dispensation of the day
• the need for appropriate measures to be developed that can be used in a
fair and unbiased way for people from all cultural groups in South Africa
• the role that a new range of test development and distribution
agencies and universities are playing to research, adapt, and
develop measures that are appropriate for our multicultural context
• the need for assessment practitioners to take personal
responsibility for ethical test use
• training and professional practice guidelines provided by statutory (e.g. the
Professional Board for Psychology) and other bodies (e.g. PsySSA, SIOPSA).

❱❱ CRITICAL THINKING CHALLENGE 2.2

Spend some time finding the common threads or themes that run through the
historical perspective of assessment from ancient to modern times, internationally
and in the South African context. You might also want to read up on the
development of psychological testing in Africa (see for example Foxcroft, 2011b).

Can assessment measures and the


2.5
process of assessment still fulfil a
useful function in modern society?
One thing that you have probably realised from reading through the
historical account of the origins of psychological testing and
assessment is that its popularity has waxed and waned over the
years. However, despite the attacks on testing and the criticism
levelled at it, psychological testing has survived, and new measures
and test-development technologies continue to be developed each
year. Why do you think that this is so? In the South African context do
you think that, with all the negative criticisms against it, psychological
testing and assessment can still play a valuable role? Think about this
for a while before you read some of the answers to these questions
that Foxcroft (1997b), Foxcroft, Paterson, Le Roux, and Herbst (2004),
Laher and Cockcroft (2015) and others have come up with for the
South African context.
Laher and Cockcroft (2015) suggest that negative perceptions of
the value of psychological assessment continue to linger in South
Africa. But they argue that the greater attention given to researching
and developing psychological tests, and using alternative
assessments such as dynamic assessment and qualitative
assessment approaches, is beginning to pay dividends. This is
‘gradually changing the national perception to one that increasingly
sees psychological assessment as an acceptable and valuable
practice’ (p. 309).
In a survey by Foxcroft, Paterson, Le Roux, and Herbst (2004), the
assessment practitioners surveyed suggested that the use of tests
was central to the work of psychologists, and that psychological
testing was generally being perceived in a more positive light. Among
the reasons offered for these perceptions was the fact that tests are
objective in nature, and more useful than alternative methods such as
interviews. Further reasons offered were that tests provide structure in
sessions with clients and are useful in providing baseline information,
which can be used to evaluate the impact of training, rehabilitation or
psychotherapeutic interventions. Nonetheless, despite psychological
testing and assessment being perceived more positively, the
practitioners pointed out that testing and assessment ‘only added
value if tests are culturally appropriate and psychometrically sound,
and are used in a fair and an ethical manner by well-trained
assessment practitioners’ (Foxcroft, Paterson, Le Roux & Herbst,
2004, p. 135).
Psychological testing probably continues to survive and to be of
value because of the fact that assessment has become an integral
part of modern society. Foxcroft (1997b) points out that in his realistic
reaction to the anti-test lobby, Nell (1994) argued:

psychological assessment is so deeply rooted in the global education and


personnel selection systems, and in the administration of civil and criminal
justice, that South African parents, teachers, employers, work seekers, and
lawyers will continue to demand detailed psychological assessments (p. 105).

Furthermore, despite its obvious flaws and weaknesses, psychological


assessment continues to aid decision-making, provided that it is used
in a fair and ethical manner by responsible practitioners (Foxcroft,
Paterson, Le Roux & Herbst, 2004). Plug (1996) has responded in an
interesting way to the criticisms levelled at testing. He contends ‘the
question is not whether testing is perfect (which obviously it is not), but
rather how it compares to alternative techniques of assessment for
selection, placement or guidance, and whether, when used in
combination with other processes, it leads to a more reliable, valid, fair
and cost-effective result’ (Plug, 1996, p. 5). Is there such evidence?
In the field of higher education for example, Huysamen (1996b)
and Koch, Foxcroft, and Watson (2001) have shown that biographical
information, matriculation results, and psychological test results
predict performance at university and show promise in assisting in the
development of fair and unbiased admission procedures at higher-
education institutions. Furthermore, a number of studies have shown
how the results from cognitive and psychosocial tests correlate with
academic performance, predict success, and assist in the planning of
academic support programmes (e.g. Petersen, Louw & Dumont, 2009;
Sommer & Dumont, 2011).
In industry, 90 per cent of the human resource practitioners
surveyed in a study conducted by England and Zietsman (1995)
indicated that they used tests combined with interviews for job
selection purposes and about 50 per cent used tests for employee
development. This finding was confirmed by Foxcroft, Paterson, Le
Roux and Herbst (2004) when, as part of the HSRC’s test-use survey,
it was found that 85 per cent of the industrial psychologists surveyed
used psychological tests in work settings.
In clinical practice, Shuttleworth-Jordan (1996) found that even if
tests that had not been standardised for black children were used, a
syndrome-based neuropsychological analysis model made it possible
to make appropriate clinical judgements that ‘reflect a convincing level
of conceptual validity’ (p. 99). Shuttleworth-Jordan (1996) argued that
by focusing on common patterns of neuropsychological dysfunction,
rather than using a normative-based approach that relies solely on
test scores, some of the problems related to the lack of appropriate
normative information could be circumvented. More recently,
Shuttleworth-Edwards (2012) reported on descriptive comparisons of
performance on the South African adapted WAIS-III and the Wechsler
Adult Intelligence Scale – Fourth Edition (WAIS-IV), which has not yet
been adapted but for which she has developed normative guidelines.
Based on a case study with a 20-year-old brain-injured isiXhosa-
speaking individual, she concluded: ‘Taking both clinical and statistical
data into account, a neuropsychological analysis was elucidated … on
the basis of which contextually coherent interpretation of a WAIS-IV
brain-injury test protocol … is achieved’ (p. 399). Similarly, Odendaal,
Brink and Theron (2011) reported on six case studies of black
adolescents where they used quantitative data obtained using the
Rorschach Comprehensive System (RCS) (Exner, 2003) together with
qualitative follow-up interview information, observations, and a
culturally sensitive approach to RCS interpretation (CSRCS). Although
cautioning that more research was needed, they concluded: ‘in the
hands of the trained and experienced clinician … using the CSRCS
potentially offers invaluable opportunities to explore young people’s
resilience-promoting processes’ (p. 537). Huysamen (2002), however,
cautions that when practitioners need to rely more on professional
judgement than objective, norm-based scores, they need to be aware
that the conclusions and opinions ‘based on so-called intuitions have
been shown to be less accurate than those based on the formulaic
treatment of data’ (p. 31).
The examples cited above suggest that there is South African
research evidence to support the value of psychological test
information when it is used along with other pertinent information and
clinical/professional judgement to make decisions. Furthermore,
Foxcroft (1997b) asserts that in the process of grappling with whether
assessment can serve a useful purpose in South Africa:

attention has shifted away from a unitary testing approach to multi-method


assessment. There was a tendency in the past to erroneously equate testing
and assessment. In the process, clinicians forgot that test results were only
one source of relevant data that could be obtained. However, there now
appears to be a growing awareness of the fact that test results gain in
meaning and relevance when they are integrated with information obtained
from other sources and when they are reflected against the total past and
present context of the testee (Claassen, 1995; Foxcroft, 1997b, p. 231).

To conclude this section, it could probably be argued that the anti-test


lobby has impacted positively on the practice of psychological
assessment in South Africa. On the one hand, psychology
practitioners have taken a critical look at why, and in what instances,
they use assessment measures as well as how to use tests in the
fairest possible way in our multicultural society. On the other hand, it
has forced researchers to provide empirical information regarding the
usefulness of assessment measures, and to adapt or develop
measures that are culturally sensitive.

❱❱ CRITICAL THINKING CHALLENGE 2.3

You have been asked to appear on a TV talk show in which you need to convince
the South African public that psychological assessment has an important role to
play in the democratic, post-apartheid South Africa. Write down what you plan to
say on the talk show and then make a recording of your talk using your cell
phone. Get a classmate to watch the recording and give you feedback.
CHECK YOUR PROGRESS 2.1
2.1 Define the following terms:
• Phrenology
• Graphology
• Psychometrics
• Standardised
• Standards
2.2 Describe how the following has impacted on psychological test
use and development in South Africa:
• Apartheid
• The Employment Equity Act and its amendment.

❱❱ ETHICAL DILEMMA CASE STUDY 2.1

Sam, a psychologist, is asked to assess 20 applicants for a management


position. All the measures that he plans to use are only available in English.
However, only five of the applicants have English as their mother-tongue; the
home language of the others is either isiXhosa and isiZulu. Sam knows that
according to the Employment Equity Act, a measure should not be biased
against any group. He is thus worried that applicants whose home language is
not English will be disadvantaged. As a result, he asks his secretary who is
fluent In English, isiXhosa and isiZulu to act as a translator.

Questions:
(a) Is it appropriate for Sam to use a translator? Motivate your answer.
(b) Will the use of a translator be sufficient to ensure that the
measures are not biased against test-takers who do not have
English as a home language? Motivate your answer.
(c) How else could Sam have approach this dilemma?
(Hint: consult the Ethical Rules of Conduct for Professionals Registered under the
Health Professions Act, 1974 (Department of Health, 2006), which you can source
from www.hpcsa.co.za)

2.6 Conclusion
This chapter introduced you to the history and development of the field
of psychological assessment from ancient to modern times,
internationally and in the South African context. Attention was given to
the factors that shaped the development of psychological assessment
in South Africa from the early years, in the apartheid era, and in the
democratic era. The chapter concluded with a discussion on whether
psychological assessment fulfils a useful function in modern society.
So far we are about one quarter of the way in our journey through
the Foundation Zone on our map. Have you picked up some of the
fundamental assessment concepts already? You should be
developing an understanding that:
psychological assessment is a complex process
• psychological measures/tests represent a scientific approach to
enquiring into human behaviour; consequently, they need to be
applied in a standardised way and have to conform to rigorous
scientific criteria (i.e. it must be empirically proved that they are
reliable and valid and, especially in multicultural contexts, that they
are not biased).
Let’s move on to the next chapter, where you can explore some of the
foundational assessment concepts further.
Basic measurement and
statistical concepts
GERT ROODT AND FRANCOIS DE KOCK
Chapter CHAPTER OUTCOMES

3 The next three chapters are the next stops in the Foundation Zone and they
introduce you to the basic concepts needed for understanding the properties
of psychological measures and how test results are interpreted. The topics
covered in these three chapters include basic measurement and scaling
concepts, reliability, and validity. In the first chapter the following themes,
namely levels of measurement, measurement errors, measurement scales,
basic statistical concepts, and test norms, are introduced.
Once you have studied this chapter you will be able to:
❱ describe the three distinguishing properties of measurement levels
❱ describe and give an example of the four different measurement levels
❱ describe and give an example for each of the different measurement scales
❱ explain for each of the different scaling methods what type
of data (measurement level) it will generate
❱ define and describe the three basic categories of statistical
measures, namely location, variability, and association
❱ name and describe the different types of test norms
❱ apply measurement concepts and evaluate your understanding.

3.1 Introduction
It is important to take some time to consider measurement issues in
psychological assessment. Recall the words by Thorndike (1918)
introduced in Chapter 2: ‘Whatever exists at all exists in some amount.
To know it thoroughly involves knowing its quantity as well as its
quality’ (p. 16). In psychology, when we measure the intangible
characteristics of people, such as their intelligence, personality, or
depression, we place a great deal of emphasis on measurement. In
psychological measurement (psychometrics), we are mostly dealing
with the measurement of abstract, intangible constructs. Although
these constructs are not visible or tangible, it does not mean that they
do not exist. We can still measure their existence in the different ways
that they manifest themselves. For instance, we can measure
intelligence by assessing people’s ability to solve complex problems,
by assessing their mental-reasoning ability, or sometimes even by the
way people drive.
Guilford (1936) referred to the great German philosopher
Immanuel Kant who once asserted that the sine qua non of a science
is measurement and the mathematical treatment of its data. Kant’s
view was that psychology could never rise to the dignity of a natural
science because, at the time, it was not possible to apply quantitative
methods to its data. Guilford concluded that if Kant today were to
browse through one of the contemporary journals of psychology, he
would at least be forced to conclude that psychologists as a group are
expending a large amount of energy in maintaining the pretence that
their work is science. Following this, Guilford then posed the question:
What would be the motive of modern-day scientists for spending this
arduous amount of energy to express their findings in terms of
statistical probability and significance coefficients? He then concluded
it is all in our ‘struggle for objectivity. Objectivity is after all the
touchstone of science’ (1936, p. 1) (own emphasis in italics). This
objectivity can be achieved through effective measurement. Runyon
and Haber define measurement as follows:

Measurement is the assignment of numbers to objects or events


according to sets of predetermined (or arbitrary) rules, or to frame it
more precisely in psychometric terms, the transformation of
psychological attributes into numbers (1980, p. 21).

This chapter introduces basic measurement and statistical concepts


that will help you to understand and work with basic psychometric
principles. If you have not completed a course in statistics, we suggest
that you invest in an introductory text to statistics.

3.2 Levels of measurement


In psychometrics there are numerous systems by which we can
assign numbers. These systems may generate data that have
different properties. Let us have a closer look at these properties now.
3.2.1 Properties of measurement scales
There are three properties that enable us to distinguish between
different scales of measurement, namely magnitude, equal intervals,
and absolute zero (De Beer, Botha, Bekwa & Wolfaardt, 1999).
3.2.1.1 Magnitude
Magnitude is the property of ‘moreness’. A scale has the property of
magnitude if we can say that one attribute is more than, less than, or
equal to another attribute. Let us take a tape measure, for example.
We can say that one person is taller or shorter than another, or
something is longer or shorter than something else, once we have
measured their/its height/length. A measure of height/length therefore
possesses the property of magnitude.
3.2.1.2 Equal intervals
A scale assumes the property of equal intervals if the difference
between all points on that scale is uniform. If we take the example of
length, this would mean that the difference between 6 and 8 cm on a
ruler is the same as the difference between 10 and 12 cm. In both
instances the difference is exactly 2 cm. In Example A in Table 3.1
(see above), an example of an equal-interval response-rating scale is
provided. This response-rating scale represents equal intervals and it
would typically generate continuous data. In all instances the interval
between scores is exactly one.
Table 3.1 Comparison of equal-interval and categorical rating scales

EXAMPLE A

Totally 1 2 3 4 5 Totally
disagree agree

EXAMPLE B

Totally disagree Disagree Unsure Agree Totally agree


somewhat somewhat

1 2 3 4 5

In the case of the scale in Example B, numbers are assigned to


different categories. In this case, the distances between categories
are not equal and this scale would typically generate categorical data.
Although the scale numbers in Example A possess equal intervals,
one should however note that there is evidence that psychological
measurements rarely have the property of precise equal intervals. For
example, although the numerical difference between IQ scores of 50
and 55 is the same as the difference between 105 and 110, the
qualitative difference of 5 points at the lower level does not mean the
same in terms of intelligence as the difference of 5 points at the higher
level.
3.2.1.3 Absolute zero
Absolute zero (0) is obtained when there is absolutely nothing
present of the attribute being measured. If we take the example of
length again, 0 cm means that there is no distance. Length therefore
possesses the property of absolute zero. By the same token, if you
were measuring wind velocity and got a zero reading, you would say
that there is no wind blowing at all.
With many human attributes it is extremely difficult, if not
impossible, to define an absolute zero point. For example, if we
measure verbal ability on a scale of 0 to 10, we can hardly say that a
zero score means that the person has no verbal ability at all. There
may be a level of ability that the particular scale does not measure;
there can be no such thing as zero ability. One exception, however, is
in the domain of psychophysical measures of perception. For
example, a person may show zero ability to detect minute levels of a
visual or auditory stimulus. This example represents a rare case
where absolute zero scores are encountered in psychological
assessment. Some might argue that behaviour rating scores used in
assessment centres may represent another possible exception as
these scales may include absolute zero scores if the rating scale
includes a ‘not observed’ category, to indicate that a candidate failed
to demonstrate any of the required behaviours linked to a particular
competency.

3.2.2 Categories of measurement levels


Some aspects of human behaviour can be measured more precisely
than others. In part, this is determined by the nature of the attribute
(characteristic) being measured, as well as the level of measurement
of the scale used.
Measurement level is an important, but often overlooked, aspect in
the construction of assessment measures because it remains an
important consideration when data sets are analysed. Measurement
serves different functions, such as sorting, ordering, rating, and
comparisons. As a consequence numbers are used in different ways,
namely to name (i.e. group according to a label – nominal numbers);
to represent a position in a series (i.e. to order – ordinal numbers);
and to represent quantity (i.e. rating or comparing – cardinal
numbers).
Each of these measurement approaches results in different types
of measurement data. Two broad groups of measurement data are
found; namely categorical data and continuous data. Categorical data
is in the form of discrete or distinct categories (e.g. males and
females). Only a specific set of statistical procedures can be used for
data analysis on categorical (discrete) data sets. Categorical
measurement data, in turn, can be divided into nominal and ordinal
measurement levels:
Nominal: With a nominal scale, numbers are assigned to an attribute
to describe or name it, such as telephone numbers or postal box
numbers. For example, a nominal measurement scale is used if the 11
official languages of South Africa are coded from 1 to 11 in a data set
or if the 15 positions of a rugby team list players from 1 to 15.
Languages or rugby players can then be sorted according to these
numbers.
Ordinal: An ordinal scale is a bit more refined than a nominal scale in
that numbers are assigned to objects that reflect some sequential
ordering or amounts of an attribute. For example, people are ranked
according to the order in which they finished a race or in terms of the
results of their class test. In psychological assessment, ordinal level
measurement allows us to say, for example, that Simphiwe
outperformed Joseph on a test, although the magnitude of this
difference cannot be determined.
Continuous data represents data that have been measured on a
continuum that can be broken down into smaller units (e.g. height or
weight). Furthermore, continuous data can be divided into interval and
ratio measurement levels:
Interval: In an interval scale, equal numerical differences can be
interpreted as corresponding to equal differences in the characteristic
being measured. An example would be a temperature scale such as
degrees Celsius: a one-degree difference between two temperature
readings remains constant irrespective of where on the temperature
scale the difference is observed (e.g. the difference between 16° and
17° is the same as the difference between 4° and 5°). IQ test scores
are usually considered to be an interval scale. Why? A difference
between two individuals’ IQ scores can be numerically determined
(e.g. the difference between IQ scores of 100 and 150 is 50, or
between 100 and 50 is also 50). However, as was pointed out in
Section 3.2.1.2, the meaning of a difference between two IQ scores
would be very different if the difference was between 150 and 100
rather than between 50 and 100. Example A (Table 3.1) can be
classified as an interval scale.
• Ratio: This is the most refined level of measurement. Not only can
equal difference be interpreted as reflecting equal differences in the
characteristic being measured, but there is a true (absolute) zero
that indicates complete absence of what is being measured. The
inclusion of a true zero allows for the meaningful interpretation of
numerical ratios. For example, length is measured on a ratio scale.
If a table is 6 m away from you while a chair is 3 m away from you,
it would be correct to state that the table is twice as far away from
you as the chair. Can you think of an example related to a
psychological characteristic? This is not common in our discipline.
The scores on psychological measures, depending on the
response scales used, either represent ordinal or, at the most,
interval data rather than ratio measurement, since they do not have
a true zero point. However, as noted, psychophysical measures of
perception are exceptions as these often include zero scores, in
addition to the aforementioned properties like equal intervals.
Continuous data, if normally distributed, are ideally suited for a wide
range of parametric statistical analysis procedures. Although many
mainstream measures used in psychology (e.g. survey measures of
attitudes, personality traits, etc.) are strictly speaking ordinal level
measures (and therefore categorical), we can treat them as interval
level measures (and therefore continuous) provided that they meet
particular conditions.
Selecting a particular measuring instrument that will generate data
suitable to your needs is no safeguard against measurement errors.
Psychometricians and researchers should therefore always be aware
of possible measurement errors.

3.3 Measurement errors


In a perfect world, psychological tests would reflect precisely and
consistently the actual quantity of any particular attribute. However,
some error is involved in any type of measure. Unfortunately, test
scores reflect both the characteristic we are trying to measure with a
particular test, as well as some degree of error.
Measurement errors may have serious consequences in
psychological assessment. First, they may negatively affect the quality
of decisions we make on test scores for different applicants, given that
we are not entirely sure that a particular score actually reflects the
actual characteristics of an applicant. Second, they can also be a
statistical nuisance in psychological research because errors in
measurement can artificially ‘deflate’ correlation coefficients computed
between scores from two measures (Nunnally, 1978). Finally, given
that scores obtained from norm groups may contain error, the
resulting norm tables from norming studies may be compromised.
In most measurement settings two broad types of errors may
occur. They are random errors and systematic errors. These errors
will be briefly introduced here and discussed in more detail in the next
chapters. As an aside, however, it is important to note that scores
generated by measuring instruments can potentially be negatively
affected by either sampling or measurement errors, or both of these
errors. Stated differently, this means that causes of measurement
errors can be attributed to sampling, the measure itself, or the
combination of these two factors.

3.3.1 Random error

Random measurement error can be defined as ‘variability of errors


of measurement that function in a random or nonsystematic manner’
(Price, 2017, p. 254).

Random error acts as statistical ‘noise’ in data and, therefore, the


degree to which random error is evident influences how much faith we
can place in test scores as reliable indicators of whichever
characteristic we are trying to measure. Random error in
psychological measures can arise from various chance factors within
the individual or environment, including guessing, complex or vague
language in questions, test anxiety, momentary shifts in attention,
disruptions that occur when individual respondents complete an online
test, and so forth (see Nunnally, 1978, and Cronbach, 1970, for more
complete lists of potential sources of measurement errors).

3.3.2 Systematic error

Systematic measurement errors are defined as constant errors of


measurement that occur when all test scores are excessively high or
low (Price, 2017).
Systematic error or measurement bias is present where test results
show a constant tendency to deviate in a particular direction. For
example, a weight scale incorrectly calibrated may show that
everyone in your family weighs 3 kg more than their ‘true’ weight. So
you can see that the error is constant (+3 kg) for everyone. Similarly,
psychological test scores can be consistently ‘off’ in one direction
when questions systematically disadvantage one group of test-takers.
For example, people from one cultural group with more knowledge of
a particular concept will do better than people from another cultural
group. Also, when test conditions are generally more favourable for
one group of test-takers, they will systematically score higher. Imagine
one group of job applicants completes a paper-and-pencil intelligence
test in a cool, air-conditioned test venue with good lighting, but
another group completes the same measure in an uncomfortably hot
mud building with no electricity. Which group do you think is likely to
score better?
The way in which measurement scales are constructed can help
psychometrists and researchers to counter the effects of
measurement errors, specifically those related to test development,
non-response error, and response bias. Test items that are well
written and pilot-tested are likely to show lower levels of measurement
error. Non-response error is the real difference between the actual
respondents in a survey and those who have decided not to
participate (if they could be objectively surveyed). Finally, response
bias is a tendency for respondents to respond to questions in a set
manner. These errors can be minimised if the principles of sound
scale construction are adhered to.

3.4 Measurement scales


3.4.1 Different scaling options
In deciding on possible measurement formats of these abstract
constructs we have a choice of using different measuring scales. The
choices we make in this regard can potentially lead to generating data
on different measurement levels. Let’s consider some of the most
frequently used scales.
3.4.1.1 Category scales
Category scales are, as the name suggests, scales where the
response categories are categorised or defined. According to both
Torgerson (1958) and Schepers (1992) a scale loses its equal interval
properties if more than two points on the response scale are
anchored. Usually several response categories exist in category
scales and they are ordered in some ascending dimension; in the
case below – the frequency of use. This type of scale would generate
categorical data; more specifically ordinal data. For example, Figure
3.1 represents a category scale that requires respondents to indicate
how frequently they make use of public transport. This particular
example is also known as a frequency response scale.
3.4.1.2 Likert-type scales
The Likert scale or summated-rating method was developed by
Rensis Likert (1932). Likert-type scales are frequently used in applied
psychological research. Respondents have to indicate to what extent
they agree or disagree with a carefully phrased statement or question.
These types of items, as shown in the examples below, will generate
categorical data. On the next page are examples of such response
scales (see Figure 3.2):

Figure 3.1 Category scale


Figure 3.2 Likert-type scales

The supervisor can be rated on different dimensions that then provide


a profile of the supervisor according to the different, rated dimensions.
The combined scores on several of these items would provide a
composite (summated) score of the supervisor in terms of how
favourable or unfavourable he or she was rated overall. In the case of
the second response scale example, a zero is used to indicate a
neutral or uncertain response. Any idea on what problem will occur if
scores on such response scales in the latter example are added?
Respondents may therefore opt for the zero if they find the choice
difficult, resulting in an abnormally peaked (leptokurtic) response
distribution.
Likert-type scales are popular in measures used in survey
research and operational testing programmes due to their many
benefits. They are easy to understand and most test-takers and
survey respondents are familiar with this scale type. These scales are
also fairly easy to implement in measures of a wide variety of
characteristics (e.g. attitudes, personality, opinions, interests, and so
forth) and the resulting scales are easy to score and analyse further
(e.g. in terms of the psychometric properties of items and scales).
3.4.1.3 Semantic differential scales
Semantic differential scales provide a series of semantic differentials
or opposites. The respondent is then requested to rate a particular
person, attribute or subject in terms of these bipolar descriptions of the
person or object. Only the extreme poles are in this instance
described or anchored. An example of such a scale is the following:

Figure 3.3 Semantic differential scale

The scores obtained on the different dimensions would provide an


overall profile of the supervisor. In this format, equal intervals are
assumed.
3.4.1.4 Intensity scales
Intensity scales are very similar to the response scales described in
Paragraph 3.4.1.3. In this case several questions are posed and the
extreme poles of the response scales are anchored and there are no
descriptors of categories between these two anchors. Examples of
two items and their respective response scales (five- and seven-point)
are listed below:

Figure 3.4 Intensity scale

The number of response categories (e.g. five or seven) may be


decided on, depending on the sophistication level of the respondent
group. It is generally accepted that more sophisticated groups can
deal more effectively with a wider range of response options. In this
instance, the assessor will assume equal interval properties of the
scale, which makes it an interval scale.
3.4.1.5 Constant-sum scales
In the constant-sum method (Torgerson, 1958) the respondent is
requested to allocate a fixed percentage or proportion of marks
between different available options, thereby indicating the relative
value or weight of each option. The following is an example of such a
scale:

Figure 3.5 Constant-sum scale

By framing several similar questions the attributes are compared to


different characteristics, thereby determining their relative weight.
3.4.1.6 Paired-comparison scales
Paired-comparison scales (one of Torgerson’s fractioning techniques,
1958) function on the same principle as described above, except that
they only compare two characteristics or attributes at a time.
Respondents are in this instance requested to divide 100 marks
between or allocate 100 points to two attributes. This procedure is
then repeated where the same attribute is also systematically
compared with all other attributes (compare Claassen, Schepers &
Roodt, 2004). The following example illustrates this scale:

Figure 3.6 Paired-comparison scale

This procedure allows one to determine an overall comparative


weighting or ranking for each value.
3.4.1.7 Graphic rating scales
In the case of graphic rating scales the response scale is presented in
the form of a visual display. The respondent would then indicate which
option best represents his or her choice or attitude. The smiling faces
scale and ladder scale are frequently used variations in this case. An
example of the smiling faces scale is displayed below:

Figure 3.7 Graphic rating scale

A word of caution needs to be mentioned here. Smiling or non-smiling


faces may mean different things in different cultures. If this response
format is therefore used in a multicultural context, it should be kept in
mind that the obtained ratings may be culturally biased.
3.4.1.8 Forced-choice scales
In some cases where intra-individual differences have to be assessed,
one can also make use of forced-choice scales. The individual would
in this instance choose the option that describes him or her best.
Provided below is an example of this scale that would generate
nominal data:

Figure 3.8 Forced-choice scale generating nominal data

The following item example will generate ordinal data:

Figure 3.9 Forced-choice scale generating ordinal data

Finally, ability-based measures (e.g. intelligence measures) also


rely on forced-choice scales. These measures often have multiple
response options, although only a single answer is coded as correct.
In this case, the resulting data are categorical and dichotomous (i.e.
someone’s answer is either 0 for incorrect, or 1 for correct).
3.4.1.9 Ipsative scales (and scoring)
Ipsative scales represent a special type of forced-choice scale.
Ipsative scales also force respondents to compare two or more item
options from different scales and pick one that is most preferred.
However, while they are the same as forced-choice scales, their
ultimate purpose is to compare a test-taker’s score on one scale within
the test to another scale within the same test (Cohen & Swerdlik,
2010). As such, ipsative tests indicate the relative strengths of the
person being tested as they compare each characteristic within a
person, rather than compared to other people. Therefore, these
measures are scored in a different way than traditional summative
scales: a given set of responses for a person across items always
sum to the same total (Meade, 2004).
According to research by Gartner, CEB research, The
Occupational Personality Questionnaire Revolution, 2013-2016, page
3, “The OPQ32i is one of the best examples of forced-choice tests.
The OPQ32i consists of 104 blocks of four statements measuring
different traits. For each block respondents have to choose one item
that is ‘Most like me’ and one ‘Least like me’. Figure 3.10 shows an
example of the block:

Figure 3.10 Ipsative forced-choice scale generating ipsative data

3.4.1.10 Guttman scales


Guttman scales are normally used in social psychology, or in attitude
and prejudice research. This method was initially developed to test the
single dimensionality of a set of items. Stated differently, it is to test
whether items belong to the same attitude dimension. A total attitude
is defined and a set of sample items is developed that represent this
universe of possible items of the total attitude. A respondent is then
requested to respond to each item of such a scale. The number of
items the respondent answered in the affirmative would reflect his or
her score on this attitude. An example of such a scale is presented
below:

Figure 3.11 Guttman scale

This example would, for instance, assess the degree of exposure a


person had had to crime.
Several considerations have to be kept in mind when designing
items and response scales.

3.4.2 Considerations in deciding on a scale format


If you were to design a measurement scale, you would have to keep
several considerations in mind in order to affect the properties of the
data that you would generate with this measure.
3.4.2.1 Single dimensional versus composite scale
Is your scale multidimensional or does it measure only a single
dimensional construct? This consideration is determined by the
underlying theoretical definition of the construct. If the theoretical
definition suggests a single dimension, the scale should then be
operationalised in a similar manner. In the case of multidimensional
theoretical definitions the scale should then also be multidimensional;
that is, measuring multiple facets. A good example of a
multidimensional measure is the Utrecht Work Engagement Scale
(Schaufeli & Bakker, 2003) which comprises three facets, namely
absorption, dedication, and vigour.
3.4.2.2 Question-versus statement-item formats
A consideration that test or questionnaire designers often have to face
is whether they should make use of question- or statement-item
formats. Research by Hanekom and Visser (1998) experimentally
tested whether items formulated in question or statement formats
would yield different response distributions. Their research did not find
any significant differences. Some researchers (e.g. Schepers, 1992;
Swart, Roodt, & Schepers, 1999), however, suggest that statement
items generate extreme responses resulting in response distributions
that are approximately bimodally shaped (double humps) as opposed
to question items that result in more normally distributed (mesokurtic)
response distributions.
3.4.2.3 Type of response labels versus unlabelled response categories
If assessors need to determine which type of response categories
they are going to use, they may decide to use category descriptors
and assign numbers to each category (measurement on a nominal
level). Alternatively, assessors can decide to get rid of all category
descriptors, except for the extreme ends, as in the cases of semantic
differential scales or intensity scales (measurement on an interval
level). These two approaches will lead to distinctively different levels
of measurement and data characteristics.
3.4.2.4 Single attribute versus comparative rating formats
Assessors may decide to assess only one particular construct (e.g. job
satisfaction) with an assessment instrument. All items of this
instrument will thus tap into this construct. Or alternatively, they can
decide to comparatively assess different constructs (e.g. price, quality
or service delivery of a product) in order to determine their relative
importance or weight as perceived by the respondents.
3.4.2.5 Even-numbered versus odd-numbered rating options
A question assessors are frequently confronted with is: How many
response categories should be included for each item? Should it be
five or seven? Or should it be four or six? Or should there be an even
or an odd number of response options? There is no easy answer – it
all depends on who the targeted respondents are. It is known that less
sophisticated respondents find it conceptually difficult to deal with
many response options. For instance, if your respondents only have
primary school education (e.g. Grade 7) a three- or four-point
response scale may be most appropriate.
It is a known fact that lesser response categories (three or four)
have a truncation effect (restriction of range) as opposed to more
response categories (six or seven) that result in a wider range of
scores. In the latter case such measures have higher reliability
coefficients. An uneven number of response categories may also
result in respondents opting for the neutral middle position as opposed
to even-numbered response categories.
3.4.2.6 Ipsative versus normative scale options
Forced-choice scales lead to ipsative measurements that only have
restricted intra-individual application value, because there can be no
normative comparison base. Normative scales, on the other hand,
have the advantage that assessors can establish a normative
comparison base, which gives the scale a far wider application
potential. There are important statistical considerations when using
ipsatively scored measures (see Meade, 2004).

3.5 Basic statistical concepts


Psychological assessment measures or survey instruments often
produce data in the form of numbers. We need to be able to make
sense of these numbers. Basic statistical concepts can help us here,
as well as when it comes to establishing and interpreting norm scores.
Statistical concepts and techniques can also help us to understand
and establish basic psychometric properties of measures such as
validity and reliability.
It is recommended that you study Sections 3.5.1 to 3.5.4 below
while working through the relevant topics in an introductory statistical
textbook.
3.5.1 Displaying data
We cannot interpret a set of data consisting of rows and columns (a
matrix) of figures sensibly. One meaningful way to make sense of the
data is to plot it; that is, to display it graphically by constructing either a
histogram or a line graph. From this graphic representation you can
get an idea whether the characteristic being measured is normally
distributed – a distribution of scores where most people score in the
area of the mean, and increasingly fewer people achieve at the
extremely high or low tails of the distribution – or whether the
distribution is skewed in any way, in which case it could tell you
something about how difficult or easy a test is. In an ‘easy’ test, many
people would answer the questions correctly and the distribution
would be negatively skewed. In a difficult test, few people would
answer the questions correctly and the distribution would thus be
positively skewed (see Figure 3.12).
Figure 3.12 Graphs showing normally distributed test scores (top); negatively skewed test
scores (centre); and positively skewed test scores (bottom)

3.5.2 Measures of central tendency


Measures of central tendency are numerical values that best reflect
the centre of a distribution of scores. As such, central tendency
measures can tell us a lot about how high or low (on average) a group
of individuals scored on a measure. Often, we like to compare
individuals’ test scores to some measure of central tendency as a
norm for comparison. There are three measures of central tendency –
the mean (denoted by which is the arithmetic average, pronounced
as ‘X bar’), the median (denoted by Me which is the middle point of
the distribution), and the mode (denoted by Mo which is the most
frequently occurring score). No formula exists for calculating the
mode, except for counting the most frequently recorded score.
The formula for calculating the arithmetic sample mean (denoted
as ) is as follows (Howell, 1995):

formula 3.1
The mean is thus calculated by the sum (∑) of all scores divided by
the number (N) of the scores.
The median presents the score where, when the data are
arranged in numerical order, 50 per cent of the distribution is below
that point and the other 50 per cent is above that point. As such, the
median is also called the 50th percentile. The median (Me) is not that
easy to derive by formula (see Price, 2017, p. 34) but can be readily
obtained from standard statistical software (e.g. MS Excel, SPSS,
etc.). The median is a very useful statistic because it is not influenced
as much by one or a few extreme low or high scores. In this way, it is
known as a less biased estimator of the population mean as
compared to the sample mean.
The median position (or location) is the location of the median in
an ordered series, which simply means that the median is the xth
number in an ordered series. The formula for calculating the median
position (Med) is as follows (Howell, 1995):

formula 3.2
Table 3.2 Aptitude and job performance scores of machine operators

OPERATOR APTITUDE TEST X SCORES JOB PRO


PERFORMANCE O
SCORES (Y)

Score Score Product Combined Job Pr


for for of xo xe score (X) performance o
odd even score (Y)
items items
(xo) (xe)
A 5 6 30 11 7

B 8 10 80 18 8

C 7 7 49 14 10

D 8 9 72 17 8

E 7 6 42 13 6

F 6 8 48 14 9

G 5 4 20 9 6

H 8 7 56 15 9

I 6 5 30 11 8

J 2 3 6 5 6

∑x = ∑y = ∑xy = ∑X = 127 ∑Y=77


62 65 433 =

∑x∑y ∑X∑Y = 9779


=
4030

In Table 3.2 (below) we have a fictitious data set of 10 machine


operators. The shaded column on the extreme left presents the
individual machine operators (A–J), the second and third column their
aptitude scores (lighter shaded columns for odd- and even-numbered
items) as well as their combined (total) scores in the fifth column, and
the job performance scores in the sixth column. We will use this
example to do some very basic calculations to illustrate the different
concepts that we want to introduce.
Now see if you can determine the following values in Exercise 3.1.

SELF-ASSESSMENT EXERCISE 3.1

Calculate the means for the test scores provided in Table 3.2 using the formulae
provided above (please note that upper- and lower-case descriptors are used):
(a) What is the mean score for column xo
(b) What is the mode for scores in column xe
(c) What is the median position for scores in column X?
You can find the answers at the end of the chapter.

3.5.3 Measures of variability


A measure of variability is a statistic that indicates the degree to which
individual scores are distributed around the mean. The range (i.e. the
difference between the largest and smallest score) is the simplest
measure of variability to compute.

SELF-ASSESSMENT EXERCISE 3.2

Can you determine the following ranges of the different sets of scores in Table 3.2?
(a) for the scores in column X?
(b) for the scores in column Y?
You can find the answers at the end of the chapter.

The variance gives an indication of how scores vary about the mean
or how they deviate from the mean . You will find that half of
these scores have positive values while the other half have negative
values, with the result that the sum of these deviations always equals
zero. By squaring the deviations one can overcome the problem of the
zero sum (squared minuses become a plus) and the outcome will
therefore always have a positive value. The sample variance (denoted
as s2) can now be calculated as the sum of the squared deviations
about the mean, divided by n − 1. According to Howell (1995) the
formula for the variance is as follows:

formula 3.3

SELF-ASSESSMENT EXERCISE 3.3


(a) Can you calculate the variance for the scores in column xo of Table 3.2?

The standard deviation (s), also designated as SD, represents the


most common form of variability that can be computed. The standard
deviation is a useful estimator of how much the scores in a population
of test-takers are likely to be spread out − that is, are they ‘bunched
up’ so that most scores are similar, or are they ‘scattered’ throughout
the possible score range? It is simply the square root of the variance
and is also referred to as sigma (Runyon & Haber, 1980):

formula 3.4

SELF-ASSESSMENT EXERCISE 3.4


(a) What is the standard deviation of the obtained variance in
column xe of Table 3.2?

3.5.4 Measures of association


Looking at the psychometric data in Table 3.2, you might wonder: Do
the machine operators who score high on the aptitude test also have
high job performance scores? When you ask whether one variable (X)
is related to another variable (Y), the statistic you have to compute is
the Pearson product-moment correlation coefficient (also denoted
as r). The correlation coefficient is calculated as the covariance of XY
divided by the product of the standard deviations of the two moments
(X and Y), and can be portrayed as (Howell, 1995):

formula 3.5
Runyon and Haber (1980) propose that this formula can be simplified
further by replacing the variances and covariances with their
computational formulae and by resolving the equation. This formula is
also known as the Pearson product-moment correlation:

formula 3.6

SELF-ASSESSMENT EXERCISE 3.5


(a) Now see if you can calculate the Pearson correlation between
columns xo and xe from the values in Table 3.2.

Having determined that a relationship exists between two variables,


for instance the aptitude scores of the machine operators and their job
performance, you can compute a simple regression equation so that
job performance (a typical outcome or criterion measure) can be
predicted from the aptitude test scores (a predictor measure). We can
use this information to build a prediction model. The greater the
magnitude (size) of the correlation between the two measures (X and
Y), the more accurate the prediction will be (of the estimated value for
Y, or , pronounced as ‘y-hat’, from X). However, we need to know
the values of the slope (b) and the intercept (a) first, before we can
resolve the simple regression equation (Howell, 1995):

formula 3.7
The value of the slope (b) and the intercept (a) can be calculated once
we know the value of r (Howell, 1995).

formula 3.8
Since we know the value of b, we can now proceed by calculating the
value of a by using the following formula (Howell, 1995):

formula 3.9

SELF-ASSESSMENT EXERCISE 3.6


(a) Can you calculate the values of a and b from your calculated
values of ∑X and ∑Y in Table 3.2?
(b) With the values of a and b known now, what would the
predicted values of Ŷ be, if we have an X value of 15?

In the case of multiple predictors (X1 and X2), the equation slightly
changes to become a multiple regression equation:

formula 3.10
In the case of the multiple regression equation the b1 and b2 denotes
the respective regression weights of the predictor variables X1 and X2
with a as the value of the intercept.

3.6 Norms
We alluded earlier in the book to the fact that some measures may be
norm-referenced while others are criterion-referenced. We pointed out
that with norm-referenced measures each test-taker’s performance is
interpreted with reference to a relevant standardisation sample or
norm group. Criterion-referenced measures on the other hand
compare the test-taker’s performance to the attainment of a defined
skill or content. The last of the basic concepts in testing that you need
to understand before studying the rest of the book is norms. We will
discuss definitions and types of norms in this section. Other aspects
concerning the use of norms and interpretation of test scores are
covered in Chapters 5 and 16.

3.6.1 The standard normal distribution


Many human characteristics measured in psychology are assumed to
be normally distributed in the population. The normal distribution,
which is constantly referred to in psychometrics, is the standard
normal (bell-shaped) distribution, which has a mean of zero and a
standard deviation of 1.
Raw scores obtained by test-takers on psychological measures
have little or no meaning. In order to make the interpretation more
meaningful, these raw scores are converted to normal (standardised)
scores through statistical transformation.

A norm can therefore be defined as a measurement against which


an individual’s raw score is evaluated so that the individual’s
position relative to that of the normative sample can be determined.

The procedure for establishing such normative samples or norm


groups is of crucial importance in any selection procedure. Let us
briefly have a look at norm groups.

3.6.2 Establishing norm groups


Norm groups consist of two subgroups, namely the applicant pool and
the incumbent population. The choice of such a norm group has to be
representative of both the applicant pool and the incumbent population
as well as appropriate for the position for which the assessment is
conducted. The similarities between the norm group, on the one hand,
and the applicant and incumbent groups, on the other hand, are
established by means of comparative aspects such as ethnicity,
gender, age, and educational background (Society for Industrial and
Organisational Psychology of South Africa (SIOPSA, 2006b).
This can be explained in more practical terms as follows: Say, for
instance, we want to establish a norm group as the comparative base
for our pilot selection test battery. The norm group should then be
selected from both the pool of applicants for pilot training (the
applicant group – not yet competent) and the population of licensed
pilots (the incumbent group – already competent). The norm group
should thus be representative of these two groups, as is reflected by
similar proportions of the biographic data of these two groups.
The entire test battery is then completed by the norm group.
Standardised or normal scores (any one of the types of scores
explained below) are calculated for each of the tests in the battery.
New applicants’ test scores can now be compared to these newly
established standardised (normal) scores.

3.6.3 Co-norming of measures


It often happens that an assessment practitioner wants to compare
scores obtained on different but related measures, in order to test for
possible learning or memory effects. This is where the practice of co-
norming comes into play.

Co-norming entails the process where two or more related but


different measures are administered and standardised as a unit
on the same norm group.

This practice can address a range of potential issues such as test-


order effects (between such measures); learning and memory effects;
gender, age, and education effects; and variations of scaling format
effects.
Co-norming may yield several practical benefits where the same
norm group is used for comparable, but different measures. A range of
different effects can be tested for such as test-order, learning, and
memory effects. These issues cannot be addressed if different norm
groups are used. The problem where obtained effects are merely
ascribed to sample differences can now be overcome by using a
common sample group.
A recent study that has used co-norming in an attempt to test for
gender, age, and education effects was the study by Kern,
Nuechterlein, Green, Baade, Fenton, Gold et al. (2008) where the
Consensus Cognitive Battery used was developed by the National
Institute of Mental Health’s (NIMH) Measurement and Treatment
Research to Improve Cognition in Schizophrenia (MATRICS) initiative.
This battery includes 10 independently developed tests that are
recommended as the standard battery for clinical trials of cognition-
enhancing interventions for schizophrenia. In this case age, education,
and gender effects were detected. Another, more dated study is that
of Zhu and Tulsky (2000) where the WAIS-III and the WMS-III scores
were evaluated using the same WMS-III standardisation sample. In
this instance the study detected significant test-order effects, but only
with small effect sizes.

3.6.4 Types of test norms


We will discuss the most commonly used norm scores. The
description of the norm scales is based on Anastasi and Urbina (1997)
and Smit (1996). You are referred to these references for a detailed
discussion of the topic.
3.6.4.1 Developmental scales
The rationale of developmental scales is that certain human
characteristics increase progressively with increases in age and
experience.
Mental-age scales: A so-called basal age is computed, that is, the
highest age at and below which a measure was passed. A child’s
mental age on a measure is the sum of the basal age plus additional
months of credit earned at higher age levels. The development of a
child with a mental age of 10 years corresponds to the mental
development of the average 10-year-old child, no matter what his or
her chronological age.
Grade equivalents: Scores on educational achievement measures
are often interpreted in terms of grade equivalents. Used in a school
setting, a pupil’s grade value, for example, is described as equivalent
to seventh-grade performance in arithmetic, eighth-grade in spelling,
and fifth-grade in reading. It is used especially for expressing
performance on standardised scholastic measures. Drawbacks of this
scale are that scale units are not necessarily equal and that they
represent the median performance with overlapping from one grade to
another.
3.6.4.2 Percentiles
A percentile rank score is the percentage of people in a normative
standardisation sample who fall at or below a given raw score. If an
individual obtains a percentile score of 70, it means that 70 per cent of
the normative population obtained a raw score lower than the
individual. Percentiles are frequently used for describing individual test
performance. The 50th percentile corresponds to the median. The
25th and 75th percentiles are known as the first (Q1) and third (Q3)
quartiles respectively as they cut off the lowest and highest quarters of
the normal distribution. Percentile rank scores can be readily obtained
from frequency tables generated in modern statistical packages (e.g.
SPSS). Alternatively, percentile rank scores can be derived from using
conversion tables set up for this purpose. These tables estimate the
placement of raw scores in the score distribution by relying on a raw
score conversion to standard scores (e.g. z-scores, which will be
discussed later).
Percentiles should not be confused with percentages (%). The
latter are raw scores expressed in terms of percentage correct
answers, while percentiles are derived scores, expressed in terms of
percentage of persons surpassing a specific raw score.
The disadvantage of percentiles is the inequality of their scale
units, especially at the extreme ends of the distribution. In addition,
percentile ranks are ordinal level measures, and cannot be used for
normal arithmetic calculations. Percentile ranks calculated for various
variables and on various populations are not directly comparable.
3.6.4.3 Standard scores
Standard scores can be classified as z-scores, linearly transformed z-
scores, and normalised standard scores.
Z-scores: A z-score expresses an individual’s distance from the mean
in terms of standard deviation units. The z-score for a particular raw
score is easily calculated by subtracting the mean from it and dividing
the result by the standard deviation (s). A raw score equal to the mean
is equal to a z-score of zero. Positive z-scores indicate above-average
performance and negative z-scores below-average performance.
When calculating z-scores, they routinely range between
approximately –3.0 to +3.0. Advantages of z-scores are that they
represent interval level measurements and may thus be statistically
manipulated. The frequency distribution of z-scores has the same
form as the raw scores on which they are based. Disadvantages are
that half of the z-scores in a distribution have negative values.
Linearly transformed standard scores: To eliminate the
disadvantages of z-scores the following linear transformation can be
done:
– multiply the z-value by a constant (A) (to compensate for the
limited range of scores); and
– add a constant (B) (to eliminate negative scores).
In formula form this would appear as follows:

formula 3.11
The disadvantages are that these scores are less evident than
ordinary z-scores and are statistically more complex. Three frequently
used standard scores are McCall’s T-score, stanine scales, and sten
scales.

McCall’s T-score: To eliminate negative values, a transformation to a


more convenient standard scale is done using McCall’s T-score,
where the mean is equal to 50 and the standard deviation is to 10.
Standard scores, z, are transformed to T as follows:
formula 3.12

Stanine scale: Another well-known and frequently used transformed


scale is the stanine (a contraction of ‘standard nine’), with a range
from 1 (low) to 9 (high), a mean of 5, and a standard deviation of 1.96.
The normal distribution curve percentages which fall into each of the
nine categories are given in Table 3.3. Standard scores, z, are
transformed to stanines as follows:

formula 3.13

Some advantages of stanine scales are the following:


scale units are equal
they reflect the person’s position in relation to the normative
sample performance in rank order is evident
they are comparable across groups
they allow statistical manipulation.
The disadvantage is that stanine scales are approximations because
they have only 9 scale units.

Sten scale: The rationale for the sten scale (a contraction of ‘standard
10’) is the same as for the stanine scale except that it consists of 10
scale units. The mean is 5.5 and the standard deviation 2. The normal
distribution curve percentages that fall into each of the 10 categories
are given in Table 3.4. The sten scale has the same advantages and
disadvantages as the stanine scale. Standard scores, z, are
transformed to sten scores as follows:

formula 3.14
Table 3.3 Normal curve percentages for the stanine scale

STANINE 1 2 3 4 5 6 7 8 9

PERCENTAGE 4 7 12 17 20 17 12 7 4
Table 3.4 Normal curve percentages for the sten scale

STEN 1 2 3 4 5 6 7 8 9 10

PERCENTAGE 2 5 9 15 19 19 15 9 5 2

Normalised standard score: Normalised standard scores are


standard scores that have been transformed to fit a normal
distribution. This is done if there is a reason to assume that the
particular human attribute (e.g. intelligence) is normally distributed in
the population. If the original distribution of raw scores was skew, z-
values based on the table for normal distribution curves are allocated
to different raw scores than they would have been if the distribution of
test scores had been normal. The advantage is that the distribution of
test scores corresponds to the normal distribution. The disadvantages
are the same as for z-scores as well as the fact that normalised
standard scores change the form of the original distribution of raw
scores. Score normalisation involves a complex process (see Price,
2017) that is routinely conducted using statistical software prior to
calculating the aforementioned linear transformations.
3.6.4.4 Deviation IQ scale
This scale, which is used by well-known individual intelligence
measures (see Chapter 10), is a normalised standard score with a
mean of 100 and a standard deviation of 15. Some of the advantages
are that this scale is easily comprehensible and interpretable, and that
it is suitable for age levels above 18. The disadvantage is that it is not
directly comparable with transformed standard scores, because the
standard deviations differ. If measure ABC has a mean of 100 and a
standard deviation of 10, a score of 130 is three standard deviation
units above the mean. If measure PQR also has a mean of 100 but a
standard deviation of 15, a score of 130 is only two standard deviation
units above the mean.
Figure 3.13 Interrelationship between various types of test norm scores

3.6.5 Interrelationships of norm scores


The choice of a norm score for a newly developed measure depends
mainly on personal preference. Standard scores and the deviation IQ
have generally replaced other types of norm scores, such as
percentiles, due to the advantages they offer (as mentioned in Section
3.6.4) with regard to the statistical treatment of test data. Provided
they meet certain statistical requirements, each of these scores can
be converted to any of the others.
An inspection of Figure 3.13 will show you how the various types of
norm scores are interrelated. Note the relative position of the mean
and standard deviation of the standard normal distribution with
reference to the various types of norm scores.

3.6.6 Setting standards and cut-off scores


As you have seen in Section 3.6 so far, the use of norms to interpret
test scores usually involves comparing an individual to a group of
people. That is, a norm score tells you an individual’s standing in
relation to the norm group. There is another way in which test scores
can be interpreted. Instead of finding out how an individual has
performed in relation to others, you can compare performance to an
external criterion (standard). For example, in the psychometrics
examination that is set by the Professional Board for Psychology, the
passing standard has been set at 70 per cent. Someone who obtains
a mark of 60 per cent may have performed better than a classmate
who obtained 50 per cent (norm-referenced comparison), but would
fail the examination nonetheless, as his or her percentage is below
that of the 70 per cent standard (cut-off) set.
Particularly in settings where large numbers of people are
assessed, it is necessary to set cut-off scores to determine who
passes or fails the measure, for accepting or rejecting job applicants,
or for deciding who does or does not receive treatment or further
assessment. The process by which cut-off scores are determined in
relation to a criterion for a measure involves ‘a complex set of legal,
professional, and psychometric issues’ (Murphy & Davidshofer, 1998,
p. 105). One way of setting cut-off scores is to draw up an
expectancy table that tabulates performance on the predictor
measure against the criterion variable. If we take the example of the
machine operators that we have used throughout this chapter, Table
3.5 represents an expectancy table that shows the relationship
between scores on the aptitude measure and job performance ratings,
which have been classified as successful or unsuccessful.
From an expectancy table you can determine the probability that a
person who obtains a certain aptitude score will be a successful
machine operator, for example. A person who obtains an aptitude
score of 5 has only a 10 percent probability of being successful, while
a person who obtains a score of 18 has a 90 percent probability. An
expectancy table can also be represented graphically by plotting the
relation between performance on the predictor and success on the
criterion.
Table 3.5 Expectancy table
APTITUDE NUMBER OF JOB RATING: JOB RATING:
SCORE PEOPLE SATISFACTORY UNSATISFACTORY
(PERCENTAGE) (PERCENTAGE)

16–20 10 90 10

11–15 8 80 20

6–10 15 30 70

Below 5 10 10 90

The main advantage of cut-off scores and expectancy tables is


that they provide an easily understandable way of interpreting the
relationship between test scores and probable levels of success on a
criterion (Murphy & Davidshofer, 2005). There are, however, pitfalls
associated with using cut-off scores and expectancy tables. In the first
instance, correlation data can be unstable from sample to sample, so
large sample sizes need to be used when establishing cut-off scores
to avoid this. A second important consideration is related to the
magnitude of the correlation between the predictor and criterion
variables. The higher the relationship, the more faith we will have in
the accuracy of our predictions. Thirdly, a criterion changes over time
(e.g. as job requirements change); this will influence the relationship
between the test scores (predictor) and the criterion and will impact on
the accuracy of decisions based on these scores. Fourthly, throughout
this book, assessment practitioners are encouraged to base a
decision on more than one test score. Thus, even when cut-off scores
are used, attempts should be made to have a band of cut-off scores,
rather than a single cut-off score. Furthermore, other assessment
information should be considered alongside the predictive information
obtained from cut-off scores and expectancy tables before any
decision is reached.
A final word of caution regarding the use of cut-off scores to predict
performance on a criterion: A measure may have differential predictive
validity in relation to a criterion for various subgroups (e.g. gender,
culture). In such an instance, if a cut-off score is determined for the
whole group, one or other subgroup could be discriminated against.
Various models have been proposed to decrease selection bias and
increase the fairness of selection decisions that are made, especially
for minority groups. There is no easy solution to eliminating bias,
however, as statistics can only go so far, and cannot correct social
inequalities. The real issue is to try to balance conflicting goals such
as providing equal employment opportunities, maximising success
rates and productivity, and redressing past imbalances by obtaining a
better demographic spread of people in the workforce through
providing preferential opportunities for those who have been
historically disadvantaged. Open discussion and successive
approximations are required to achieve this (Anastasi & Urbina, 1997).

3.6.7 Issues with norm use


All standardised tests should have norm tables, and these are
routinely provided to test users in test manuals and/or in online testing
suites. Although it is generally an excellent idea to rely on test norms
to make better decisions about test-takers, there are a few issues and
guidelines that require consideration. As far as possible, use relevant
norms. It does not make sense to rely on norms generated for one
particular target population (e.g. qualified engineers in England) in an
unrelated context (e.g. to select students for entry into a graduate
training programme in South Africa). Related to this issue, it may be
useful to generate your own local norms for a specific company or
industry, given that it normally increases the relevance of norms to the
test-taker sample. Furthermore, norm tables should be recent (and
not outdated) and updated on a continuous basis. Finally, although the
practice of using different test norms for various demographic groups
(e.g. race and/or gender groups), also known as subgroup norming,
is generally considered unfair employment practice (and therefore
illegal) in many countries (e.g. United States of America), in South
Africa it is considered legal. All psychometrists and psychologists
should be able to construct and use norm tables, given that this task
represents a core competency in our profession.

❱❱ ETHICAL DILEMMA CASE STUDY 3.1

Dr Phumzile Mthembu is a specialist consultant in psychological assessment and


selection. Dr Mthembu was requested by a mining company to select a large number
(>200) of management trainees from different culture groups to attend an
accelerated management development programme. Dr Mthembu used a measure to
test applicants’ learning potential and found that there were significant differences in
the culture groups’ mean scores and standard deviations. Should Dr Mthembu reveal
this to the company, or should he just make statistical corrections on the respective
groups’ test scores and proceed with the selection?

Who are the potential stakeholders that need to be informed about


the test’s failure to show similar central tendency and dispersion
between the respective cultural groups?

3.7 Conclusion
This chapter introduced you to the basic concepts in psychological
measurement. You learned about the four types (levels) of
measurement, and different methods of measurement scaling. You
were also introduced to the basic statistical concepts of displaying
data; i.e. measures of central tendency, variability, and association.
The chapter ended with a discussion of the different types of norm
scores, and procedures for applying norms in assessment settings.
What you have learned in this chapter will enable you to establish
whether your data meet the criteria for applying particular statistical
analyses (e.g. in normally distributed data versus distribution-free
data). This is an important consideration that is often overlooked in
psychological and/or social research. Besides this, measurement data
should also meet reliability and validity requirements. These two
concepts will be discussed in more detail in the next two chapters.

ANSWERS TO SELF-ASSESSMENT EXERCISES 3.1–3.6


3.1 (a)mean score in column xo = 6.2
(b) modes for column xe = 6 and 7
(c) median position = 5.5 and median value
= 13.5 3.2 (a) range for column xo = 5 − 18
(b) range for column xe = 6 − 10
3.3 (a)variance for scores in column xo

3.4 (a)standard deviation of the obtained variance in column xe

3.5 (a)Pearson correlation between scores of columns xo and xe

3.6 (a)values of a and b respectively 4.8552 and 0.224 for ∑X and ∑Y


(b) the value of Ŷ = 8.2152

OTHER INTRODUCTORY TEXTBOOKS


Cohen, R.J. & Swerdlik, M.E. 2010. Psychological testing and assessment: An
introduction to tests & measurement. 7th ed. Boston: McGraw-Hill.
Domino, G. & Domino, M.L. 2006. Psychological testing: An introduction. 2nd ed.
Cambridge, New York: Cambridge University Press.
Hogan, T.P. & Cannon, B. 2003. Psychological testing: A practical introduction. New
York: Wiley & Sons.

ADVANCED READING
Cronbach, L.J. 1990. Essentials of psychological testing. 5th ed.
New York: Harper & Row.
Gregory, R.J. 2007. Psychological testing: History, principles and applications. 5th ed.
Boston: Pearson Education.
Groth-Marnat, G. 2003. Handbook of psychological assessment. 4th ed. Hoboken, NJ:
John Wiley & Sons.
Kerlinger, F.N. & Lee, H.B. 2000. Foundations of behavioral research. 4th ed. Fort
Worth, TX: Harcourt College Publishers.
Miller, L.A., Lovler, R.L. & McIntire S.A. 2011. Foundations of psychological testing: A
practical approach. 3rd ed. Thousand Oaks, CA: Sage.
Murphy, K.R. & Davidshofer, C.O. 2005. Psychological testing: Principles and
applications. 6th ed. Upper Saddle River, NJ: Prentice Hall.
Reliability: Basic concepts
and measures
GERT ROODT AND FRANCOIS DE KOCK
Chapter CHAPTER OUTCOMES
4 Once you have studied this chapter you will be able to:
❱ define ‘reliability’
❱ name and describe the different types of reliability
❱ interpret a reliability coefficient
❱ explain which factors may affect reliability coefficients.

4.1 Introduction
As discussed in the previous chapter, all measures used in
psychological assessment contain some form of measurement error.
Even if the characteristic we are trying to measure with Test A, for
example, remains constant, scores from this test are likely to fluctuate.
Let’s look at some example data to illustrate. As a psychologist at a
packaging organisation you wish to use an aptitude measure
consisting of 20 items to assist you in the selection of machine
operators. You decide to assess 10 current operators to determine the
reliability of the measures. On both the aptitude test and the
performance evaluation, higher scores indicate more favourable
results. In Table 4.1 (on p. 60) we have a fictitious data set of 10
machine operators. The shaded column on the extreme left presents
the individual machine operators (A–J), the second and third column
their aptitude scores for odd- and even-numbered items, as well as
their combined (total) scores for times one (T1) and two (T2) in the
fifth and sixth columns respectively. We will use this example to do
some very basic calculations to illustrate the different concepts that we
want to introduce.
One of the best ways to get to grips with the topics discussed in
this chapter is to try out some of the calculations. In order to gain a
better understanding of the concepts that we introduce in this and the
following chapter you can use the data provided in Table 4.1 to do the
calculations yourself.
Table 4.1 Aptitude scores of machine operators
OPERATOR APTITUDE TEST SCORES

Score Score Product Total Total Product


for odd for even of xo xe score score of (X1
items items (X1) (X2) T2 X)
(xo) (xe) T 2
1

A 5 6 30 11 12 132

B 8 10 80 18 16 288

C 7 7 49 14 15 210

D 8 9 72 17 17 289

E 7 6 42 13 14 182

F 6 8 48 14 13 182

G 5 4 20 9 9 81

H 8 7 56 15 14 210

I 6 5 30 11 13 143

J 2 3 6 5 6 30

∑Xo = ∑Xe = ∑XoXe ∑X1 ∑X2 = ∑X1X2


62 65 = 433 = 129 = 1747
127
∑(X )2 = ∑(X )2 = ∑X ∑X ∑X ∑X
o e o e 1 2
416 465 = 4030 = 16383
4.2 Reliability
In the previous chapter we defined measurement as the process of
assigning numbers to objects according to clearly specified rules. If,
for example, we obtain repeated measures of the same object, these
measures should provide us with a fairly consistent result. This
measurement process entails that we have a clear conception of at
least three aspects, namely:
what the entity is that we want to measure (e.g. weight or height) what
exactly the nature of the measure is that we want to use (e.g. the
scale or ruler)
the application of the rules on how to measure the object (e.g. with or
without clothes, shoes).
In order to obtain reliable measures of these objects we therefore
have to apply or consider these three aspects in a consistent manner.
Let’s have a closer look at what reliability is precisely.

4.2.1 Defining reliability


Consider the following scenario: one day you travel along a national
highway between two cities and the speedometer of your car indicates
120 km/h. If you travelled on the same highway again and wanted to
complete your journey in the same time, you would drive at 120 km/h
again. You would not be pleased if the car’s speedometer indicated
120 km/h when, in actual fact, you were only travelling at 100 km/h.
This is what reliability is about: a true speed of 120 km/h must always
be indicated as ± 120 km/h (within a certain error margin) on the
measure (i.e. speedometer), not 100 km/h one day and 130 km/h the
next.
From the above discussion we can deduce that reliability is linked
to consistency of measurement. Thus:

The reliability of a measure refers to the consistency with


which it measures whatever it measures.
Test score reliability is important because it affects the dependability
of scores we use to make decisions about people. Using unreliable
tests can lead to incorrect decisions as these tests contain substantial
‘noise’, in addition to reflecting the characteristic they are supposed to
measure. The Employment Equity Act (No. 55 of 1998) specifies,
among other things, that psychological measures or other similar
assessments need to meet two technical criteria, namely those of
reliability and validity. So it is important that you know what reliability
is and also understand how to determine the reliability of assessment
instruments.
However, as indicated in the previous example, consistency
always implies a certain amount of error in measurement (random
error and systematic error were introduced in the previous chapter). A
person’s performance in one administration of a measure does not
reflect with complete accuracy the ‘true’ amount of the trait that the
individual possesses. There may be other systematic or chance
factors present, such as the person’s emotional state of mind, fatigue,
noise outside the test room, etc., which may affect his or her score on
the measure.
We can capture the true and error measurement in equation form
as follows:
X=T+E
formula 4.1
where:
X = observed score (the total score)
T = proportion true score (reliability of the measure)
E = proportion error score (unexplained variance).

If we obtained the test scores of the target population, the variance


of the observed test scores (X) would be expressed as follows in
terms of true (T) and error (E) variance:

formula 4.2
Reliability (rxx) can be defined as the ratio of true score variance to
observed score variance:

formula 4.3

Assume that, for a measure, the:


true score variance = 17
error score variance = 3
observed score variance = 20 (i.e. 17 + 3).

If this information were substituted into Formula 4.3, the reliability


of the measure would be computed as being = .85.
You might well ask now: How do we compute the reliability of
scores from a measure? True score is a theoretical concept. Do we
ever know a person’s true score? The answer is no. We use observed
data to compute the reliability of tests’ scores. The numerical
expression for reliability is a reliability coefficient, which is nothing
more than a correlation coefficient between parallel measures (e.g.
parallel across time intervals, items, raters, etc.). As an aside, you
should note that it is more appropriate to refer to the reliability of
scores from/of a measure, rather than ‘the reliability of the test’. This is
because reliability does not reside in the test itself but, rather, is also a
function of its use (e.g. the sample, etc.). We will address this issue
later in this chapter.

4.2.2 Types of reliability


We will now consider the five types of reliability coefficient. These are
test-retest, alternate-form, split-half, inter-item (Kuder-Richardson and
Coefficient Alpha), as well as inter- and intra-scorer reliability
(Anastasi & Urbina, 1997; Huysamen, 1996a; Smit, 1996). These
different indices exist because they each help to address a particular
dimension of consistency that might be important to a researcher’s
study.
4.2.2.1 Test-retest reliability
An obvious method for determining the reliability of a measure is to
administer it twice to the same group of test-takers. The reliability
coefficient in this case is simply the correlation between the scores
obtained on the first (T1) and second (T2) application of the measure.
This coefficient is also called a coefficient of stability. Although
straightforward, the test-retest technique has a number of drawbacks.
The testing circumstances may vary in terms of the test-takers
(emotional factors, illness, fatigue, worry); the physical environment
(different venue, weather, noise); or even testing conditions (the
venue, instructions, different instructors), which may contribute to
systematic error variance. Transfer effects, such as practice and
memory, might play a role on the second testing occasion. For most
types of measures, this technique is not appropriate for computing
reliability.

SELF-ASSESSMENT EXERCISE 4.1

In Table 4.1 you will find two sets of measures for the aptitude total
scores (at time 1 and at time 2).
(a) Calculate the test-retest reliability (coefficient of stability) for the
aptitude measure.
(b) Is this coefficient ‘acceptable’ in your view?

4.2.2.2 Alternate-form reliability


In this method two equivalent forms of the same measure are
administered to the same group on two different occasions. The
correlation obtained between the two sets of scores represents the
reliability coefficient (also called a coefficient of equivalence). It
must be noted that the two measures should be truly equivalent, that
is, they should have the same number of items; the scoring procedure
should be exactly the same; they must be uniform in respect of
content, representativeness, and item difficulty level (see Chapter 6);
and the test mean and variances should be more or less the same.
Because the construction of equivalent tests is expensive and time-
consuming, this technique is not generally recommended unless it is
crucial to have multiple equivalent forms.
4.2.2.3 Split-half reliability
This type of reliability coefficient is obtained by splitting the measure
into two equivalent halves (after a single administration of the test)
and computing the correlation coefficient between these two sets of
scores. This coefficient is also called a coefficient of internal
consistency.
How do you split a measure? It is not advisable to take the first and
second halves of the measure, since items tend to be more difficult
towards the end, especially with ability tests where scores tend to drop
off with later items (and other reasons). The most common approach
is to separate scores on the odd- and even-item numbers of the
measure.
The correlation between the odd- and even-item scores is,
however, an underestimation of the reliability coefficient because it is
based on only half the test. The corrected reliability coefficient (rtt) is
calculated by means of the Spearman-Brown formula (see Nunnally &
Bernstein, 1994, p. 351):

formula 4.4

in which:
rhh = the correlation coefficient of the half-tests.

SELF-ASSESSMENT EXERCISE 4.2

In Table 4.1 you will find the scores for the odd-numbered (xo) and
the even-numbered items (xe). These two sets of scores represent
two equivalent halves of the aptitude test.
(a) Calculate the split-half reliability (coefficient of internal consistency) of the
aptitude measure.
(b) Use the Spearman-Brown formula above for correcting your
internal consistency coefficient.
(c) Is this coefficient acceptable?

4.2.2.4 Inter-item consistency


Another coefficient of internal consistency, which is based on the
consistency of responses to all items in the measure (i.e. the inter-
item consistency), is obtained using the Kuder-Richardson method.
The formula for calculating the reliability of a test in which the items
are dichotomous, that is scored either a 1 or a 0 (for right or wrong
responses), is known as the Kuder-Richardson 20 (KR 20) formula
(see Nunnally & Bernstein, 1994, p. 235):

formula 4.5
where:
rtt = reliability coefficient
n = number of items in test
St2 = variance of total test score
pi = proportion of test-takers who answered item i correctly qi
= proportion of test-takers who answered item i incorrectly.

SELF-ASSESSMENT EXERCISE 4.3

You have the following data on a fictitious measure called Cognitive


Functioning under Stress (De Beer, Botha, Bekwa & Wolfaardt, 1999):
n = 10
∑pq = 1.8
St2 = 12
(a) Compute the reliability of this test using the Kuder-Richardson 20 formula.
(b) Could this measure be included in a selection assessment
battery? Give reasons for your answer.
Some psychological measures, such as personality and attitude
scales, have no dichotomous responses (i.e. no yes/no responses).
They usually have multiple response categories. Cronbach (1951, p.
304) developed the Coefficient Alpha (α), which is a formula for
estimating the reliability of such measures. The formula is as follows
(also see Huysamen, 2006 for an in-depth discussion of Alpha):

formula 4.6
where:
α = reliability coefficient
n = number of items in test
st2 = variance of total test score
∑si2 = sum of individual item variances.

SELF-ASSESSMENT EXERCISE 4.4

You have the following data on a fictitious questionnaire called


Interpersonal Skills Scale (De Beer et al., 1999):
n = 12
∑St2 = 4.2
St2 = 14
(a) Compute the reliability for this questionnaire using the
Coefficient Alpha formula.
(b) Could this questionnaire be included in a selection assessment
battery? Give reasons for your answer.

4.2.2.5 Inter-scorer reliability


The reliability coefficients discussed so far apply especially to
measures with highly standardised procedures for administration and
scoring. Where this is not the case (e.g. in individual intelligence tests,
projective techniques, in the case of assessment centre ratings, and
answers to open-ended questions as encountered in class tests),
examiner variance is a possible source of error variance. Inter-scorer
(or inter-rater) reliability can be determined by having all the test-
takers’ test protocols scored by two assessment practitioners. The
correlation coefficient between these two sets of scores reflects the
inter-scorer reliability coefficient. Inter-scorer reliability thus refers
to the consistency of ratings between raters, which can also be
expressed in the form of a slightly adapted version of the Coefficient
Alpha:

formula 4.7
where:
α = inter-scorer reliability coefficient
n = number of items or rating dimensions in a test
St2 = variance on all the raters’ summative ratings (total scores)
∑Si2 = the sum of the raters’ variances across different rating
dimensions (sub-scores).
4.2.2.6 Intra-scorer reliability
Whereas the inter-scorer reliability refers to the consistency of ratings
between raters, the intra-scorer reliability coefficient refers to the
consistency of ratings for a single rater. Repeated ratings or scores by
the same rater would give an indication of the degree of error variance
between such ratings for that particular rater. The same Coefficient
Alpha formula as above is then slightly adapted in the following
manner:

formula 4.8
where:
α = intra-rater reliability coefficient
n = number of items or rating dimensions in a test
St2 = variance of a rater’s scores on different individual’s
summative ratings (total scores)
∑Si2 = the sum of the raters’ variances on each rating dimension
(sub-scores) for different individuals being assessed.
4.2.2.7 Contemporary approaches to estimating reliability
In addition to the methods outlined above, a vast array of
contemporary approaches to reliability estimation has been
developed. Even Cronbach, one of the main early protagonists of
Coefficient Alpha, had serious reservations about the limits of
mainstream methods we heavily rely on today to estimate reliability: ‘I
no longer regard the alpha formula as the most appropriate way to
examine most data’ (Cronbach, 2004, p. 403). In response to these
limitations, various alternative indices have been developed, reflecting
either the classic tradition (based on Classical Test Theory), the
random effects model tradition (based on generalisability theory), the
confirmatory factor analytic tradition or, finally, the Item Response
Theory (IRT) tradition (Putka, 2017; Price, 2017). Estimating reliability
is surely important and this field is developing rapidly.

4.2.3 Factors affecting reliability


We have briefly referred to random error and systematic error in our
discussion on measurement in the previous chapter, where we
indicated that both these factors may affect reliability. Systematic error
basically originates from two broad sources, namely respondent or
test-taker error, as well as administrative error. Let us have a closer
look at each of these sources.
4.2.3.1 Respondent error
In this section two categories of systematic respondent or test-taker
error can be identified (compare Zikmund, Babin, Carr & Griffin, 2010):
Non-response errors / self-selection bias. Non-response errors
occur when respondents do not fully complete their tests or
assessments. This is not a large problem when psychological tests
are administered under standard classroom conditions, because the
test administrator can check if the scales are fully completed. But for
researchers who are working with volunteers this can be a major
source of systematic bias. Research is in most instances conducted
among volunteers. Voluntary respondents therefore have a choice to
participate or refuse to participate in the research. Why do some
participants choose to participate while others merely refuse or decide
not to respond? Self-selection bias provides an answer to this
question. This occurs when some respondents feel positive about the
research subject while the non-respondents feel negative or indifferent
about the topic and decide on these grounds not to participate.
Response bias. A response bias occurs when respondents decide to
systematically respond in a set or fixed manner to the item or the
question, thereby purposively presenting a skewed picture (compare
Zikmund et al., 2010). Different forms of response bias exist that may
systematically skew the obtained scores (also compare Smith, 2004;
Smith & Roodt, 2003):
– Extremity bias is a type of bias that occurs when a respondent
responds either very positively or very negatively to a particular
question. Opposed to extremity bias is centrality or neutrality
bias where a person constantly opts for the neutral or the central
response options. Considering the use of questions versus
statements, it seems that the use of statements with Likert-type
response scales may amplify extreme responses as opposed to
the use of questions (see Swart, Roodt, & Schepers, 1999).
– Stringency or leniency bias is a type of bias frequently
encountered when raters or assessors are used to generate
scores. These raters can then either be very strict or lenient in
their ratings or assessments of an object.
– Acquiescence bias occurs when a respondent is in agreement
with all statements or questions he or she is asked about. No
clear preferences or dislikes are therefore registered.
– Halo effect occurs where respondents are systematically
influenced by favourable or unfavourable attributes of the object
that they rate or assess. It is argued, for instance, that raters
would rate the subjects that they like more positively.
– Social-desirability bias refers to a very similar response bias
where the respondent reacts in a manner that is socially
desirable or acceptable. The respondent thereby wishes to
create a favourable impression or wishes to present a positive
image of him or herself.
– Purposive falsification. Falsification occurs when respondents
or test-takers purposefully misrepresent facts or deliberately
provide factually incorrect responses. Persons may do this to
prevent embarrassment; to conceal personal information; or
even to protect their privacy. Persons may also do this purely
out of boredom when they, for instance, decide to tick only the
second option or merely randomly respond on all the questions.
– Unconscious misrepresentation. Unlike falsification,
misrepresentation is not done on purpose. People may not have
factually correct information; may not be able to recall the
correct information; or may even not understand questions in the
way they were intended. The recorded responses may therefore
be unintentionally incorrect.

More specifically, some intra-individual factors that affect reliability


coefficients, according to Anastasi and Urbina (1997), are as follows:
Whether a measure is speeded. The difference between speed and
power measures was pointed out in Chapter 1. Test-retest and
equivalent-form reliability are appropriate for speeded measures. Split-
half techniques may be used if the split is according to time rather
than items (i.e. half-scores must be based on separately timed parts of
the test).
Variability in individual scores. Any correlation is affected by the
range of individual differences in the group. Figure 4.1 indicates a
moderate to strong positive correlation for the total group, while the
correlation for the smaller subgroup in the block is close to zero. This
phenomenon is known as range restriction. A reliability coefficient
should be computed for the group for which the measure is intended.
Moreover, it is possible to estimate the reliability for an unrestricted
sample by means of a correction (for so-called attenuation) formula,
however this falls outside of the scope of this text.
• Ability level. The variability and ability level of samples should also
be considered. It is desirable to compute reliability coefficients
separately for homogeneous subgroups, such as gender, age, and
occupation, instead of for the total heterogeneous sample. If mean
scores are significantly different for these subgroups one can
conclude that there may be some form of systematic error variance.

Figure 4.1 Illustration of range restriction of scores

The second source of variability in systematic error variance is


administrative error:
4.2.3.2 Administrative error
Variations in administrative procedures normally occur when non-
standardised assessment practices are followed. Containing these
sources of error variance is of the utmost importance if assessments
are conducted in standard classroom conditions, which is also the
case with assessment centres. A host of different causes may exist
that can be grouped under the following headings (compare
Schlebusch & Roodt, 2008):
Variations in instructions. It may happen that instructions to
respondents or test-takers are inconsistent or not provided in a
standardised manner. Test manuals or instruction booklets are
important aids in assisting the test administrator or researcher in
providing standardised instructions.
Variations in assessment conditions. Test administrators should
also ensure that tests are conducted under standardised conditions.
Typically, reference is made to ‘standard classroom conditions’. Any
deviation from these ‘standard conditions’ would therefore
compromise the consistency of conditions and the assessment
context.
Variations in the interpretation of instructions. The variation in the
interpretation of instructions of the test, or assessment in general, or
with reference to specific sections of a questionnaire or test, may
result in variable assessment outcomes. Care should therefore be
taken to ensure that test-takers (testees) or respondents understand
all oral or written instructions in exactly the same way.
• Variations in the scoring or ratings. Most test or assessment
instruments have clear instructions on how responses on the
instrument should be scored or rated. If computer scoring is used,
care should be taken to ensure that the data is correctly edited,
coded, and processed. Any deviation from these scoring or rating
instructions will also result in variations in assessment outcomes.

Utmost care should therefore be taken in the standardisation of


assessment conditions, administration procedures and the clarity of
instructions that will avoid any ambiguity. In the age of computer-
administered (‘online’) assessments, many of the sources of
administrative error associated with traditional pencil-and-paper
tests have been removed, given the automated nature of the
assessment. However, other sources of error may have been
introduced. Can you think of possible examples? You will learn more
about measurement error in Section 4.2.4.2.

4.2.4 Interpretation of reliability


Now that you have learned about reliability and the various methods
for determining reliability, it is time to consider some practical aspects
of the assessment of reliability of measures. You must be able to
answer questions such as:
When is a reliability coefficient acceptable in practice?
How are reliability coefficients interpreted?
What is meant by standard error of measurement?
4.2.4.1 The magnitude of the reliability coefficient
When is a reliability coefficient acceptable in practice? The answer
depends on what the measure is to be used for. Anastasi and Urbina
(1997) state that standardised measures should have reliabilities
ranging between .80 and .90. Huysamen (1996a) suggests that the
reliability coefficient should be .85 or higher if measures are used to
make decisions about individuals, while it may be .65 or higher for
decisions about groups. According to Smit (1996), standardised
personality and interest questionnaires should have a reliability of .80
to .85, while that of aptitude measures should be .90 or higher.
4.2.4.2 Standard Measurement Error
How are reliability coefficients interpreted? Another way of expressing
reliability is through the Standard Measurement Error (SME). SME can
be used to interpret individual test scores in terms of the reasonable
limits within which they are likely to vary as a function of measurement
error. It is independent of the variability of the group on which it was
computed. The SME can be computed from the reliability coefficient of
a test, using the following formula:

formula 4.9
Where:
st = SD of the test score
rtt = reliability of the test scores.
Like any standard deviation, SME can be interpreted in terms of the
normal distribution frequencies (see Section 3.5). If, for example, SME
= 4, the chances are 68 per cent that a person’s true score on the
test will be 4 points on either side of his or her measured/observed
score (e.g. between 26 and 34 if the measured score is 30).

SELF-ASSESSMENT EXERCISE 4.5


If a measure has a standard deviation of 9 and a reliability coefficient of .88:
(a) What would the SME be?
(b) What does this mean in practical terms?

4.2.4.3 Reliability and mastery assessment


In mastery testing or criterion-referenced assessment, there is often
little variability of scores among test-takers. Mastery measures try to
differentiate between people who have mastered certain skills and
knowledge for a specific job or training programme, and those who
have not. The usual correlation procedures for determining reliability
are therefore inappropriate. References to different techniques used in
this regard may be found in Anastasi and Urbina (1997).

» ETHICAL DILEMMA CASE STUDY 4.1

Dr Phumzile Mthembu, our specialist consultant and selection expert,


used a managerial aptitude test (which was bought and imported from an
overseas test developer with supporting validation studies conducted in the
US) to test a group of applicants from different cultural backgrounds before
undergoing the accelerated management development programme. Dr
Mthembu found that the reliability of the managerial aptitude test yields an
overall Coefficient Alpha reliability of just over .70 for the overall group. On
further investigation it was also found that the reliability of the measure is
below .60 for some of the culture groups.

Questions
(a) Can Dr Mthembu just assume that the managerial aptitude test
will also be reliable and valid in the South African context based
on the US validation studies?
(b) Based on these test results, should Dr Mthembu just proceed with
the selection of the best candidates for the programme or should he
first try to establish the causes for the group differences in reliability?
(c) What could these causes for group differences in reliability potentially be?
(d) How may Dr Mthembu potentially increase the levels of reliability?
4.3 Conclusion
This chapter introduced and defined the concept ‘reliability’, which is
an important building block in working with and understanding
psychological measurement data. This chapter introduced different
types of reliability, as well as factors (individual and administrative)
that affect the reliability of measures, and finally how reliability should
be interpreted. What you have learned in this chapter will enable you
to determine whether the data you intend to use is reliable or meets
reliability requirements. But instruments and measures can never be
valid without being reliable. Validity will be discussed in more depth in
the next chapter.

ANSWERS TO SELF-ASSESSMENT EXERCISES 4.1–4.5


4.1 (a)
The test-retest reliability is: (rounded to two decimals: .95)

(b) Assuming that internal and external conditions remain equal and
that no transfer effects occur, there is high test-retest reliability.
4.2 (a) The split-half reliability (coefficient of internal consistency) is:
(rounded to two decimals: .82)
(b) Using the Spearman-Brown formula, the corrected internal
consistency coefficient is:
(c) Yes this coefficient is highly acceptable.
4.3 (a) The KR 20 is: (rounded to two decimals: .94)
(b) Yes, it can be used because there is little error variance (high reliability).
4.4 (a) The Coefficient Alpha is: (1.091)(.7) = .764 (rounded to two decimals: .76)
(b) Yes, the questionnaire is moderately-highly reliable (considering
the amount of error variance in the questionnaire).
4.5 (a) The SME is:
(b) A person’s true score will vary +/- 3.12 about the obtained score.

OTHER INTRODUCTORY TEXTBOOKS


Cohen, R.J. & Swerdlik, M.E. 2010. Psychological testing and assessment: An
introduction to tests & measurement. 7th edition. Boston: McGraw-Hill.
Domino, G. & Domino, M.L. 2006. Psychological Testing: An introduction. 2nd ed.
Cambridge, New York: Cambridge University Press.
Hogan, T.P. & Cannon, B. 2003. Psychological Testing: A practical introduction. New
York: Wiley & Sons.

ADVANCED READING
Anastasi, A. & Urbina, S. 1997. Psychological testing. 7th ed. Upper Saddle River, NJ:
Prentice-Hall.
Cronbach, L.J. 1951. Coefficient alpha and the internal structure of
tests. Psychometrika, 16(3): pp. 297–334. doi:10.1007/bf02310555
Cronbach, L.J. 1990. Essentials of psychological testing. 5th ed.
New York: Harper & Row.
Gregory, R.J. 2010. Psychological testing: History, principles and applications. 5th ed.
Boston: Pearson Education.
Groth-Marnat, G. 2003. Handbook of psychological assessment. 4th ed. Hoboken, NJ:
John Wiley & Sons.
Kerlinger, F.N. & Lee, H.B. 2000. Foundations of behavioral research.
4th edition. Fort Worth, TX: Harcourt College Publishers.
Miller, L.A., Lovler, R.L., & McIntire S.A. 2011. Foundations of psychological testing: A
practical approach. 3rd ed. Thousand Oaks, CA: Sage.
Murphy, K.R. & Davidshofer, C.O. 2005. Psychological testing: Principles and
applications. 6th ed. Upper Saddle River, NJ: Prentice Hall.
Nunnally, J.C. & Bernstein, I.H. 1994. Psychometric theory. New York: McGraw-Hill.
Price, L.R. 2017. Psychometric methods: Theory into practice. New
York: Guilford Press.
Putka, D. 2017. Reliability. In J. Farr, & N. Tippins (Eds.), Handbook of
employee selection. 2nd ed. (pp. 3-33). Mahwah, NJ: Lawrence Erlbaum.
Validity: Basic concepts and measures
GERT ROODT AND FRANCOIS DE KOCK
Chapter CHAPTER OUTCOMES
5 Once you have studied this chapter you will be able to:
❱ define ‘validity’
❱ name and describe the three broad types of validity in the
validity triangle and the sub-categories under each of them
❱ provide examples of indices of validity as well as their interpretation
❱ explain which factors may affect validity coefficients.

5.1 Introduction
In the previous chapter we mentioned that the Employment Equity Act
(No. 55 of 1998) specifies that psychological measures and other
assessments should be reliable and valid. Reliability and validity are
two important criteria that all measuring instruments have to meet.
Now let us return to the practical example that we introduced in the
previous two chapters.
As a psychologist at a packaging organisation you wish to use an
aptitude measure consisting of 20 items to assist you in the selection
of machine operators. You decide to assess 10 current operators to
determine the validity of the measures. On both the aptitude test and
the performance evaluation, higher scores indicate more favourable
results.
In Table 5.1 on the next page we have a fictitious data set of 10
machine operators. The shaded column on the extreme left presents
the individual machine operators (A–J), the second and fourth column
their mechanical aptitude scores for Test X and B respectively, and
their job-performance scores (Y) in the sixth column. We will use this
example to do some very basic calculations to illustrate the different
concepts that we want to introduce.
In order to get to grips with the topics that we introduce in this
chapter, you can try out some of the calculations yourself by using the
data provided in Table 5.1 on the next page.
Table 5.1 Aptitude and job-performance scores of machine operators
OPERATOR MECHANICAL APTITUDE TEST SCORES JOB
(TEST X AND TEST B) PERFORMANCE
SCORES (Y)

Total X2 Total B2 Total Y2


Score Score Score Y
Test X Test B

A 11 121 15 225 7 49

B 18 324 27 729 8 64

C 14 196 15 225 10 100

D 17 289 10 100 8 64

E 13 169 17 289 6 36

F 14 196 13 169 9 81

G 9 81 8 64 6 36

H 15 225 11 121 9 81

I 11 121 7 49 8 64

J 5 25 7 49 6 36

∑X = 127 ∑X2 = ∑B = ∑B2 = ∑Y=77 ∑Y2


1747 130 2020 =
611

∑X∑Y= ∑X ∑B ∑B ∑Y
9779 = =
16510 10010
5.2 Validity
At this point you should already know what reliability is. One other
requirement of any measure is that it must be valid. In this section we
will focus on the concept of validity.

5.2.1 Defining validity


Let us return to the example of the aptitude test outlined above. From
the reliability calculations that you conducted in Chapter 4 you can
accept the fact that Test X measures consistently (i.e. it is reliable).
Now consider this: What does this instrument really measure – a
person’s mechanical aptitude? Ideally, this is what you would want –
you would not want the instrument to indicate a person’s
comprehension ability or his or her language proficiency. This is what
validity is all about:

The validity of a measure concerns what the test measures and how
well it does so. As such, measurement validity reflects the accuracy of
measurement.

Validity is not a specific property of a measure – a psychological


measure is valid for a specific purpose (i.e. it has a high or low
validity for that specific purpose). Therefore, validity is always the
interaction of both the purpose of the instrument and the sample. For
example, an instrument may yield valid scores for a sample from the
norm group, but may yield invalid scores from a different sample and
population (compare Schepers, 1992). Moreover, validity depends on
the way the test scores are used to reach decisions about applicants.
For instance, we can use a test with good measurement validity (for a
particular characteristic) in a way that leads to invalid decisions. One
example would be if we used a well-established intelligence test to
select individuals where differences in intelligence should matter
relatively little, for example, in a position where general cleaning
duties have to be performed. As the test is unrelated to the job
requirements, it would most likely show poor decision validity despite
having good measurement validity (Murphy & Davidshofer, 2005).
In the previous chapter we also mentioned that the measurement
process entails that we should have a clear conception of at least
three aspects, namely:
What the entity is (e.g. a person’s weight or height) that we want to
measure – in other words, the exact purpose of the test or construct
that we want to measure.
What the exact nature of the measure is (e.g. the scale or measuring
tape) that we want to use, i.e. how well it would measure the
construct.
• Finally, the application of the rules on how to measure the object
(e.g. with or without clothes, shoes).

Different types of validity inform us about these three questions.

5.2.2 Types of validity


There are three types of validity or validation procedures, namely
content-description, construct-identification, and criterion-prediction
procedures (Anastasi & Urbina, 1997). Some authors call them
aspects of validity (Smit, 1996) or types of validity (Huysamen,
1996a). These types of validity follow in a particular logical sequence
to establish a larger ‘validity picture’ of a measure. Figure 5.1 shows a
conceptual diagram of the validity triangle, which represents three
major strategies we use to accumulate validity evidence. Routinely,
researchers work in a clockwise fashion; that is, starting with content
validation, followed by construct-oriented strategies, and finally,
exploring criterion prediction. You will also note that the first two
strategies focus on measurement issues (i.e. internally focused),
whereas the third (i.e. criterion prediction) considers decision validity
(i.e. externally focused). We will now consider these types of validity.
Figure 5.1 The validity triangle

5.2.2.1 Content-description procedures


There are two important aspects when considering the content validity
of a measure:
Face validity
– Face validity is a type of validity in non-psychometric or non-
statistical terms. It does not refer to what the test actually
measures, but rather to what it appears to measure. It has to do
with whether the measure ‘looks valid’ to test-takers who have to
undergo testing for a specific purpose. In this sense, face validity
is a desirable characteristic for a measure. Sometimes the aim of
the measure may be achieved by using phrasing that appears to
be appropriate for the purpose. For example, an arithmetic-
reasoning test for administrative clerks should contain words like
‘stationery’ and ‘stock’ rather than ‘apples’ and ‘pears’. When you
apply for a job, it is not uncommon to be asked to play a
computerised ‘game’ or simulation to identify your personality traits
(e.g. risk-taking propensity). For it to show face validity, it should
be clear to you how it may measure a characteristic that is
important for the position you are applying for.
– Testees or a panel of subject experts are normally used to
assess the face validity; that is, the ‘look and feel’ of the
measuring instrument. Once a test developer is satisfied with
this content-validity facet, he or she can proceed with the
content-validation procedure.
Content validity
Content validity involves determining whether the content of
the measure (e.g. a video clip of a Situational Judgment Test)
covers a representative sample of the behaviour domain/aspect
to be measured (e.g. the competency to be measured).

It is only partially a non-statistical type of validity where one part refers


to a specific procedure in constructing a psychological measure. (The
different phases of this procedure will be discussed in more detail in
Chapter 6.) A frequently used procedure to ensure high content
validity is the use of a panel of subject experts to evaluate the items
during the test-construction phase. The other part refers to factor
analytical procedures described further down (see Chapter 7).
Content validity is especially relevant for evaluating achievement,
educational, and occupational measures (also see Schepers, 1992).
Content validity is a basic requirement for domain-referenced/criterion-
referenced and job sample measures. Performance on these
measures is interpreted in terms of mastery of knowledge and skills
for a specific job that are important for employee selection and
classification.
Only when an instrument is content valid can it be elevated to the
next level of validity assessment, namely construct validation.
However, content validity is usually not the most appropriate aspect of
validity to establish for aptitude (as in our above example) and
personality measures. Although the content of aptitude and
personality measures must be relevant and representative of a
specific domain, the content requires validation through criterion-
prediction procedures (see Section 5.2.2.3) if the intended purpose is
to use the measures for some decision in the employment context.
5.2.2.2 Construct-identification procedures
Construct validity
A construct is a theoretical abstraction that serves as a label for sets
of behaviours that appear to ‘go together’ in nature. The following are
just a few examples of such theoretical (intangible) constructs or traits:
intelligence, personality, verbal ability, numerical ability, spatial
perception, eye-hand coordination, or specific personality facets such
as emotional stability and introversion-extroversion. The measures to
be discussed in subsequent chapters concern many possible
theoretical constructs in psychology.

The construct validity of a measure is the extent to which it


measures the theoretical construct or trait it is supposed to measure.

Establishing construct validity normally involves a quantitative,


statistical analysis procedure. There are a number of different
statistical methods to ascertain whether the measure actually
measures what it is supposed to be measuring. They include factorial
validity and correlation with other tests (to establish convergent and
discriminant validity).
Factorial validity
Factor analysis is a statistical technique for analysing the
interrelationships of variables. The aim is to determine the underlying
structure or dimensions of a set of variables because, by identifying
the common variance between them, it is possible to reduce a large
number of variables to a relatively small number of factors or
dimensions. The factors describe the factorial composition of the
measure and assist in determining subscales. The factorial validity
(Allen & Yen, 1979) of a measure thus refers to the underlying
dimensions (factors) tapped by the measure, as determined by the
process of factor analysis. Different factor analytic procedures can be
used here, such as exploratory factor analyses (EFAs) and
confirmatory factor analyses (CFAs) (refer to Chapter 7). This
procedure is normally used when a new measure is developed or
when an existing measure is applied in a different context to where the
original one was validated.
Correlation with other tests: Convergent and discriminant validity
A high correlation between a new measure and a similar, earlier
measure of the same construct indicates that the new measure
assesses approximately the same construct or area of behaviour. The
correlation should be only moderately high, otherwise the measure is
just a duplication of the old one.
A measure demonstrates construct validity when it correlates
highly with other variables with which it should theoretically correlate
and when it correlates minimally with variables from which it should
differ. Campbell and Fiske (1959) describe the former as convergent
validity (sometimes referred to as mono-trait, hetero-method) and the
latter as discriminant validity (also referred to as hetero-trait, mono-
method). This method is normally used to determine whether a new
construct is sufficiently ‘isolated’ from other dissimilar instruments or
constructs.
It would be nearly futile to try and predict some criterion (or
construct) with measures that are not construct-valid. Therefore, only
if a construct has met the requirements of construct validity can it
effectively enter into the third validation level.

SELF-ASSESSMENT EXERCISE 5.1

In our example in Table 5.1 you can calculate the correlation


between the total score of the aptitude Test X and the scores
obtained on the second aptitude Test B.
(a) What correlation coefficient did you get?
(b) Is this aptitude test construct valid?

5.2.2.3 Criterion-prediction procedures


Criterion-prediction validity
Criterion-prediction validity can be defined as follows:

Criterion-prediction validity involves the calculation of a


correlation coefficient between predictor(s) and a criterion.

Criterion-prediction validity (also known as criterion-related validity) is


determined by means of a quantitative (statistical) procedure. We call
this process criterion-related validation. Two different types of
criterion-prediction validity can be identified, namely concurrent and
predictive validity (Murphy & Davidshofer, 2005). The distinction
between these two types of criterion-prediction validity is based on the
purpose for which the measure is used. The two types of criterion-
prediction validity can be defined as follows (Huysamen, 1996a; Smit,
1996):
Concurrent validity

Concurrent validity involves the accuracy with which a measure


can identify or diagnose the current behaviour or status regarding
specific skills or characteristics of an individual.

This definition clearly implies the correlation of two (or more)


concurrent sets of behaviours or constructs. Let us briefly consider the
following example:

SELF-ASSESSMENT EXERCISE 5.2

If we assume for a moment that Aptitude Test X and the job-


performance measures (Y) were taken at the same point in time,
then we can calculate the concurrent validity of the aptitude test:
(a) What is the concurrent validity in this case?
(b) Is this acceptable or not?

Predictive validity

Predictive validity refers to the accuracy with which a measure


can predict the future behaviour or category status of an individual.

The fact that psychological measures can be used for decision-making


is implicit in the concept of predictive validity. Let us have a brief look
at the following exercise:

SELF-ASSESSMENT EXERCISE 5.3

Now let us assume in our above example that the job-performance


scores were taken at a different point in time.
(a) What is the validity coefficient now?
(b) Will it be different from Exercise 5.2?
(c) Do you think that the job-performance scores are good (reliable
and valid) criterion scores?
(d) Why do you say this?

• Known-groups validity/differential validity


A measure possesses known-groups validity if it succeeds in
differentiating or distinguishing between characteristics of individuals,
groups or organisations. These entities are grouped based on a priori
differences and then compared against the selected measure. If test
scores discriminate across groups that are theoretically known to
differ, they show known-groups validity (Hattie & Cooksey, 1984). For
instance, a measure of self-efficacy would exhibit known-groups
validity if it can distinguish between different individuals’ levels of self-
efficacy. Or, a personality measure should be able to generate
different personality profiles for two unique individuals. This validity
requirement is often overlooked in the development of measures to
assess group or organisational processes (also see Du Toit & Roodt,
2003; Petkoon & Roodt, 2004; Potgieter & Roodt, 2004). If we return
to our practical example above, we can form two contrasting groups
based on their job-performance scores (low and high). If the mean
aptitude test scores for these two groups are statistically significantly
different, then we can conclude that the aptitude test has differential
validity.
Incremental validity
Sechrest (1978) suggests another validity construct, namely
incremental validity, which should be presented in addition to other
validity information. Sechrest argues that if a new test is reducible to
an intelligence or a social-desirability measure, its raison d’être will
most probably disappear. A measure therefore displays incremental
validity when it explains numerically additional variance compared to a
set of other measures when predicting a dependent variable. For
instance, a measure of emotional intelligence would possess
incremental validity if it explained additional variance (in the criterion)
compared to a set of the ‘big five’ personality measures in the
prediction of job performance. Do you think this is indeed the case?

SELF-ASSESSMENT EXERCISE 5.4

If we use both the aptitude tests’ scores in the prediction of the job-
performance scores:
(a) Would Aptitude Test X explain more variance than Aptitude Test B?
(b) What do the results suggest?

Validity generalisation and meta-analysis


Many criterion validity studies are carried out in an organisation in
which the validity for a specific job is to be assessed. These are also
called local validation studies, or primary validation studies. Generally,
the samples are too small to yield a stable estimate of the correlation
between the predictor and criterion. Anastasi and Urbina (1997)
mentioned research done by Schmidt, Hunter, and their co-workers,
who determined that the validity of tests measuring verbal, numeric,
and reasoning aptitudes can be generalised widely across
occupations. Apparently the successful performance of a variety of
occupational tasks is attributable to cognitive skills that are common to
many occupations (Salgado, Anderson, Moscoso, Bertua, De Fruyt &
Rolland, 2003; Schmidt & Hunter, 1998).
Validation studies can be conducted in an incremental way where
the focus ranges from validating a measure in a single organisation to
validation of traits across national and cultural boundaries. These
studies can therefore be:
– firstly, conducted in a specific organisation
– secondly, conducted for a specific industry
– thirdly, conducted across industries in a specific country
– fourthly, conducted across different countries and cultures.

The above categories suggest that validation studies may have


different foci and different levels of generalisation. Two approaches of
criterion-related validation that have become popular over the last
few decades are validity generalisation and meta-analysis (Anastasi
& Urbina, 1997). We will only briefly introduce meta-analysis here as
a detailed discussion is beyond the scope of this chapter.
Meta-analysis is a quantitative method of reviewing published
research literature. It is a statistical integration and analysis of
previous research studies’ data on a specific topic. Correlational meta-
analysis is quite commonly used in the industrial psychological field for
assessing test validity.
To conclude this section on criterion-prediction procedures, we
point out another concept, namely cross-validation. No measure is
ever ‘perfect’ after one administration to a group or one sample of the
target population for the measure. It is essential to administer a
second, refined version of the measure, compiled after an item
analysis (see Chapter 6), to another representative normative sample.
Validity coefficients should then be recalculated for this (second)
sample. During cross-validation, shrinkage, or a lowering of the
original validity coefficient, might take place due to the size of the
original item pool and the size of the sample. Spuriously high validity
coefficients may have been obtained the first time due to chance
differences and sampling errors.
5.2.2.4 Possible criterion measures
Any psychological measure described in this book can be a possible
predictor. A criterion, as the name suggests, is a benchmark
variable against which scores on a psychological measure are
compared or evaluated (which can be another psychological
measure). Apart from the correlation coefficient, the predictive validity
of a measure used to select or classify people is also determined by
its ability to predict their performance on the criterion (e.g. job
performance).
Criterion measures can be either objective (e.g. financial data) or
subjective (e.g. ratings) in nature. A few possible measures for a
criterion are mentioned below. You are referred to other sources, such
as Anastasi and Urbina (1997), for a more complete discussion.
Academic achievement
Academic achievement is one of the most frequently used criteria for
the validation of intelligence, multiple aptitude, and personality
measures. The specific indices include school, college or university
grades, achievement test scores, and special awards.
Performance in specialised training
A criterion frequently used for specific aptitude measures is
performance in specialised training. Think about achievement
performance related to training outcomes for typists, mechanics,
music and art courses, degree courses in medicine, engineering and
law, and many others.
Job performance
The most appropriate criterion measure for the validity of intelligence,
special aptitude, and personality measures is actual job performance.
Any job in industry, business, the professions, government, and the
armed services is suitable. Test manuals reporting this kind of validity
should describe the duties performed and how they were measured.
When validating personality measures, contrasted groups are
sometimes used. For example, social traits may be more relevant and
important for occupations involving more interpersonal contact, such
as executives or marketing personnel, than occupations not requiring
social skills (e.g. office clerks).
Psychiatric diagnoses
Psychiatric diagnoses based on prolonged observation and case
history may be used as evidence of test validity for personality
measures.
Ratings
Ratings by teachers or lecturers, training instructors, job supervisors,
military officers, co-workers, etc. are commonly used as criteria. A
wide variety of characteristics may be rated, such as competency,
honesty, leadership, job performance, and many more. Ratings are
suitable for virtually every type of measure. Although subjective,
ratings are a valuable source of criteria data for test validation – in
many cases they are the only source available. Keep in mind that
raters may be trained to avoid the common errors made during rating
(discussed in Chapter 4) such as ambiguity, halo error, errors of
extremity or central tendency (giving most people average ratings),
and stringency or leniency errors.
Other valid tests
Finally, an obvious way to validate a new measure is to correlate it
with another available valid test measuring the same ability or trait.
However, before using these various criterion measures, you must
take note of the concept of criterion contamination. This is the effect
of any factor or variable on a criterion such that the criterion is no
longer a valid measure. For example, a rating scale (e.g. performance
ratings) is often used to operationalise the criterion variable. Among
the shortcomings of ratings are rating biases, where the rater may err
on being too lenient, or may make judgments on a ‘general
impression’ of a person or on a very prominent aspect only. These
response errors or biases (referred to in Chapter 4) also have a
negative effect on the criterion validity. In such an instance, we would
say the criterion is ‘contaminated’. The criterion must thus be free from
any form of bias. From your statistical knowledge you should realise
that this will influence the correlation coefficient with a predictor. There
are other requirements for criterion measures, too. For example, a
criterion must capture the totality of the construct and not show
criterion deficiency. Also, the measure that is used to operationalise
the criterion construct must show criterion relevance. Finally, the
criterion measure should also be reliable. As you can see, these are
the same high standards that we apply to predictor construct
measures (e.g. intelligence tests). It is important to keep in mind that
criteria (for example, job-performance appraisal measures) need to be
measured well given that we often use them to validate predictor
measures. Because criterion measurement in organisations can be so
challenging, it has often been called the ‘criterion problem’ (Austin &
Villanova, 1992; Schmitt, Arnold & Nieminen, 2017).

5.2.3 Unitary validity


Against the above background, validity is viewed as a
multidimensional construct – that is, it consists of different validity
facets (content-descriptive, construct validity and criterion-predictive
validity) as we explained above. Several authors (Anastasi, 1986;
Nunnally, 1978) introduced the idea that validity is also not a singular
event, but rather, a process that evolves over a period of time. A ‘body
of evidence’ on validation information regarding validity facets is
developed over a period of time by means of multiple validation events
conducted on diverse groups. Unitary validity therefore considers the
whole ‘body of evidence’ and includes all the facets of validity
mentioned above. It is clear from this viewpoint that the validation
process is never completed, or that a single validation study in a
specific setting that addresses predictive validity, for instance, cannot
fully inform one on the unitary validity of an instrument.
Efforts to validate a particular instrument should therefore address
all facets of validity so that it can inform the unitary validity of the
instrument. Test developers and distributors should systematically
compile validation data on their different instruments so that all validity
facets are included.

5.2.4 Indices and interpretation of validity


We will now discuss validity indices.
5.2.4.1 Validity coefficient
We stated earlier that the predictive validity coefficient is a correlation
coefficient (r) between one or more predictor variables and a criterion
variable. If differential performance leads to the necessity of
determining different validity coefficients for different demographic
groups, such as gender, language or race, it is said that the measure
exhibits differential impact (compare paragraph 5.2.4.2, on the
differential impact of subgroups).
Magnitude of the validity coefficient
How high should a validity coefficient be? Anastasi and Urbina (1997)
and Smit (1996) state that it should be high enough to be statistically
significant at the .05 and .01 levels. Huysamen (1996a) states that it
depends on the use of the test. Values of .30 or even .20 are
acceptable if the test is used for selection purposes. Schmidt and
Hunter (1998) provide a meta-analytic overview of common validity
coefficients for various types of construct measures (to predict job-and
training performance criteria). It is important to keep in mind, however,
that there is a so-called validity ceiling (+- .30) which is often caused
by various statistical artefacts. As such, it is customary to statistically
correct validity coefficients for these artefacts to obtain a better idea of
how important a predictor measure may be in the performance of
people in the workplace context.
5.2.4.2 Factors affecting the validity coefficient
The following factors may impact the validity of a measure:
Reliability
The validity of a measure is directly proportional to its reliability; that
is, the reliability of a measure has a limiting influence (imposes a
ceiling) on its validity (also see Schepers, 1992). According to
Schepers (1992) the validity of a test can never exceed the square
root of its reliability. Therefore, there is no point in trying to validate an
unreliable measure. Also, remember that reliability does not imply
validity. Thus, reliability is a necessary but not sufficient precondition
for validity.
Differential impact of subgroups
The nature of the group under investigation is also important. The
validity coefficient must be consistent for subgroups that differ in age,
gender, educational level, occupation, or any other characteristic. If
this is not the case, these subgroups (according to biographic factors)
may have a differential impact on the validity coefficient.
Sample homogeneity
Sample homogeneity is also relevant. If scores are very similar
because group members are very similar (for instance a group of
postgraduate students), we may have a ‘restriction of range’ case.
Therefore, the wider the range of scores (sample heterogeneity), the
higher the validity coefficient (refer to range restriction in Section
4.2.3.1).
Linear relationship between the predictor and criterion
The relationship between predictor and criterion must be linear
because the Pearson product-moment correlation coefficient is used.
But what do we do if the relationship is non-linear or there is a change
in the slope of the regression line (a spline)?
Criterion contamination
If the criterion is contaminated it will affect the magnitude of the
validity coefficient. However, if contaminated influences are known
and measured, their influence can be corrected statistically by means
of the partial correlation technique.
Moderator variables
Variables such as gender, age, personality traits, and socio-economic
status may affect the validity coefficient if the differences between
such groups are significant. These variables are known as moderator
variables and need to be kept in mind, as a specific measure may be
a better predictor of criterion performance for, say, men than for
women, thus discriminating against or adversely impacting the latter.
You will learn more about adverse impact and test bias in Chapter 7.
5.2.4.3 Coefficient of determination
The coefficient of determination (r2) is the square of the validity
coefficient. It indicates the proportion of variance in the criterion
variable that is accounted for by variance in the predictor (test) score,
or the amount of variance shared by both variables (Huysamen,
1996a; Smit, 1996).
Figure 5.2 Common variances between predictor and criterion variables

Look at the following example:


Validity coefficient: (r) = .35
Coefficient of determination:
(r2) = .352 = .1225

r2 can be expressed as percentage of common variance: .1225 ×


100 = 12.25%
This can be depicted graphically as in Figure 5.2.

SELF-ASSESSMENT EXERCISE 5.5

With reference to our example above:


(a) What would the coefficient of determination be for the Aptitude
Test X and the Aptitude Test B?
(b) Is this an acceptable coefficient?
(c) What is the shared common variance between X, B, and Y?
(d) Why do you say this?

5.2.4.4 Standard error of estimation


When it is necessary to predict an individual’s exact criterion score,
the validity coefficient must be interpreted in terms of the standard
error of estimation. The formula is as follows:

formula 5.1
Where:
sy = standard deviation (SD) for scale Y
r2xy =coefficient of determination for scales X and Y
The standard error of estimation is interpreted in the same way as a
standard deviation. If, for example, rxy = .60 and SEest = .80, where sy
= 1, it can be accepted with 95 per cent certainty that the true criterion
score will not deviate (plus or minus) more than (1.96 × .8) = 1.57
from the predicted criterion score.

SELF-ASSESSMENT EXERCISE 5.6

With reference to our example (Exercise 5.2), aptitude test scores (x) were
used to predict performance scores within a 95 per cent confidence interval:
(a) Compute the standard error of estimation of the job
performance criterion where rXY = 0.61, sY = 1.42.
(b) How much would these scores vary about an obtained score of 30?

5.2.4.5 Predicting the criterion: Regression analysis


Essentially, regression has to do with prediction. If there is a high
positive correlation between a measure and a criterion, the test score
can be used to predict the criterion score. These predictions are
obtained from the regression line, which is the best-fitting straight
(linear) line through the data points in a scatter diagram.
Regression analysis always involves one criterion variable. Simple
regression implies that we have only one predictor variable, while we
talk of multiple regression when we have two or more predictor
variables.
The regression equation (formula for the regression line) for simple
regression is as follows:

formula 5.2

The symbol represents the predicted Y-score (the criterion).


The
regression coefficient is designated by b. This is the slope of the
regression line, that is, how much change in Y is expected each time
the X-score (the predictor) increases by one unit. The intercept a is
the value of when X is zero. An interesting fact is that the means of
the two variables will always fall on the regression line.
The regression equation for a multiple regression with three
predictors is:

formula 5.3

where:
X1, X2, X3 =predictors 1, 2, 3
b1, b2, b3 = weights of the respective predictors
b0 = intercept (i.e. the a in simple regression).

SELF-ASSESSMENT EXERCISE 5.7

You have conducted the correlation and regression between the Aptitude Scores
(X) and the Job Performance Scores (Y) in Exercise 5.3. If b = .22 and a
= 4.91: What would the predicted value of Ŷ be for value X = 16?

» ETHICAL DILEMMA CASE STUDY 5.1

Dr Phumzile Mthembu, our specialist consultant and selection expert, used a


somewhat biased learning-potential test and a partially, differentially functioning
managerial aptitude test for selecting candidates from different cultural
backgrounds for an accelerated management-development programme. After
completion of the management-development programme, Dr Mthembu
determined what the individual candidates’ course outcomes were. By using the
managerial aptitude and learning-potential scores as predictors, a model for
predicting programme performance was developed. Dr Mthembu established that
55 per cent of the variance in programme performance could be explained by
learning potential and managerial aptitude scores. Dr Mthembu consequently
decided to use this model as a basis for the selection of future
candidates for the management development programme.

Questions
(a) Should Dr Mthembu proceed to use the predictive model as
sufficient evidence for the validation of his selection model?
What is the basis for your argument?
(b) On the basis of what you know about the learning potential and the
managerial aptitude tests, would the regression slopes and the
intercepts of the regression lines be the same for the different
culture/race groups in predicting programme performance?
(c) Is Dr Mthembu’s selection model equitable and fair for the
different culture groups?

5.3 Conclusion
This chapter introduced and defined the concept ‘validity’. Three broad
types of validity were introduced and within these three categories
different variations of validity were explained. The rest of the chapter
focused on the application of validity, factors that affect validity
coefficients, and the different validity indices used in practice. What
you have learned thus far in Chapter 5 will enable you to establish or
assess whether measures are valid or generate data that is valid.
It would also be helpful if you could get hold of the Guidelines for
the Validation and Use of Assessment Procedures compiled by the
Society for Industrial and Organisational Psychology of South Africa
(SIOPSA, 2006b). These guidelines were developed to assist
psychologists and assessment practitioners in industry to adhere to
the requirements of the Employment Equity Act (No. 55 of 1998) to
ensure that assessment practices are reliable, valid, unbiased, and
fair.

ANSWERS TO SELF-ASSESSMENT EXERCISES 5.1–5.7


5.1 (a)
(b) Yes, because it is correlated with another
similar construct. 5.2 (a) Concurrent validity is: .611
(b) Yes, it is acceptable.
5.3 (a) The predictive validity would remain the same, namely .611.
(b) No, the numbers are the same.
(c) It seems as if they are valid.
(d) Unreliable measures would normally not yield significant correlations.
5.4 (a) Yes, X correlates .611 (r2 = .373) with Y and B correlates .341 (r2 = .116)
with Y.
(b) Test X thus shares more variance with Y.
5.5 (a) Coefficient of determination for X (r2 = .373) and for B (r2 = .116)
(b) It seems that X is acceptable, but B not.
(c) The percentage shared variance is 37.3% for X and 11.6% for B.
(d) Based on the obtained coefficients of determination in (c).

(b) The true score will (with 95% certainty) not deviate more
than 2.2 from the predicted score of 30, that is vary between
27.8 to 32.2 (1.96 × 1.124 = 2.20)
5.7 (a) Predicted value of Y where X = 16 is: 8.43

OTHER INTRODUCTORY TEXTBOOKS


Cohen, R.J. & Swerdlik, M.E. 2010. Psychological testing and assessment: An
introduction to tests & measurement. 7th ed. Boston: McGraw-Hill.
Domino, G. & Domino, M.L. 2006. Psychological testing: An introduction. 2nd ed.
Cambridge, New York: Cambridge University Press.
Hogan, T.P. & Cannon, B. 2003. Psychological testing: A practical introduction. New
York: Wiley & Sons

ADVANCED READING
Anastasi, A. & Urbina, S. 1997. Psychological testing. 7th ed. Upper Saddle River, NJ:
Prentice-Hall.
Cronbach, L.J. 1990. Essentials of psychological testing. 5th ed.
New York: Harper & Row.
Gregory, R.J. 2010. Psychological testing: History, principles and applications. 5th
edition. Boston: Pearson Education.
Groth-Marnat, G. 2003. Handbook of psychological assessment. 4th ed. Hoboken, NJ:
John Wiley & Sons.
Miller, L.A., Lovler, R.L., & McIntire S.A. (2011). Foundations of psychological testing: A
practical approach. 3rd ed. Thousand Oaks, CA: Sage.
Murphy, K.R. & Davidshofer, C.O. 2005. Psychological testing: Principles and
applications. 6th ed. Upper Saddle River, NJ: Prentice Hall.
Nunnally, J.C. & Bernstein, I.H. 1994. Psychometric theory. New York: McGraw-Hill.
Developing a psychological measure
CHERYL FOXCROFT
Chapter CHAPTER OUTCOMES
6 By the end of this chapter you will be able to:
❱ list the steps involved in developing a measure
❱ explain the essential psychometric properties and item characteristics
that need to be investigated when developing a measure
❱ discuss how technology has revolutionised test development
❱ explain what a test manual should contain
❱ explain how to evaluate a measure.

6.1 Introduction
Developing a psychological measure is a time-consuming and people-
intensive process. Using technology has improved the efficiency of the
test-development process but has also added additional aspects such
as designing multimedia items that involve drawing on the expertise of
graphic designers and technologists. It takes three to five years to
develop a new measure from the point at which the idea is conceived
to the moment when the measure is published and becomes available
to the consumer. In-depth, advanced psychometric knowledge and
test-development expertise is required by those who initiate and guide
the development process. It is beyond the scope of this book to
provide detailed information regarding the development of a measure.
Instead, this chapter will give you a brief glimpse of the steps involved
in developing a measure. The main purpose in outlining the steps in
developing tests is to enable you to evaluate the technical quality of a
measure and the information found in a test manual.

6.2 Steps in developing a measure


Have you ever had the opportunity to develop a test of any sort?
Maybe it was a short test that you compiled for you and your friends to
work out when you were studying for your Grade 12 exams? Or
maybe it was one that you developed for your younger brother or
sister when you were helping them prepare for a test at school? What
was the first thing that you needed to know? That’s right! The subject
the test had to cover. Next, you had to decide if the test would be
paper-based and completed with a pen or pencil or whether you were
going to create the test on your smartphone using one of the Apps
that enable you to generate a questionnaire and record the answers.
The next step was to consider the number and type of questions you
wanted to include. Then you had to write or type the questions,
making sure that the wording was as clear and as understandable as
possible. You then probably gave the test to your brother, sister, or
friends. Once they had completed the test, it had to be marked.
Marking could be done electronically if the App that you used allowed
you to punch in the correct answers. If the test was paper-based, you
would have done the marking manually using the memorandum that
you developed in which you wrote down the correct answer to each
question. You would have found the memorandum very useful if your
brother, sister, or friends complained about your marking afterwards!
The process of developing a psychological measure is similar to
that of the informal test described above, only it is a little more
complex. Briefly, a psychological measure needs to be planned
carefully, items need to be written or constructed, and the initial
version of the measure needs to be administered so that the
effectiveness of the items can be determined. After this, the final items
are chosen and the measure is administered to a representative group
of people so that the measure’s validity, reliability, and norms can be
established. Finally, the test manual is compiled.
Now that you have the ‘big picture’, let’s add a little more detail.
The phases and steps involved in the development of a psychological
measure are set out in Table 6.1.
This probably looks like quite an intimidating process. So, let’s take
each of the phases and steps listed in the table and amplify them. You
should also be aware that although there are different phases in
developing a measure, the process is dynamic. Sometimes a previous
phase is revisited based on a decision made during a subsequent
phase (Foxcroft, 2004c; Wilson, 2004).
6.2.1 The planning phase
We need a blueprint or a plan for a measure that will guide its
development. Foxcroft (2004c) argues that the planning phase is not
always given sufficient emphasis and that when a measure is
developed for use in a multicultural and multilingual context, a
considerable amount of time needs to be spent planning it. This
section will give you insight into the key elements of a test plan.
6.2.1.1 Establish a test-development team
Given the complexities involved in developing a measure, a diverse
team of experts needs to be assembled. There are no set
prescriptions for how to put together such a team as team
membership will vary according to what is being measured, and the
mode and format of the test. The following people could form part of a
test-development team:
A team/project leader who should have psychometric expertise as well
as experience in the construct domain(s) and in the key area of
application of the test.
Specialists in the construct domains that will be included in the test.
Psychometric and statistical specialists.
Cultural and linguistic experts, if the test is being developed for a
multicultural and/or multilingual context (see Section 7.7 in Chapter 7).
Specialists in the field of psychology in which the test will be applied.
Graphic designers, technologists, software developers and engineers,
game developers, and so on, if a computer-based test is being
developed.
Administrative support.
Table 6.1 The development of a psychological measure

PHASE SPECIFIC STEPS

Planning • Establish a test-development


team
• Specify the aim, target
population, and nature of the
measure
• Define the content of the
measure
• Consider other test
specifications to include in the
plan

Item development • Develop or source the items


• Review the items

Assembling and pretesting the • Arrange the items


experimental version of the • Finalise the length
measure • Answer mechanisms and
protocols
• Develop administration
instructions
• Pre-test the experimental
version of the measure

Item analysis • Determine item difficulty values


• Determine item discrimination
values
• Investigate item bias
• Identify items for final pool

Revising and standardising the • Revise test and item content


final version of the measure • Select the items for the
standardisation version
• Revise and standardise
administration and scoring
procedures
• Compile the final version
• Administer the final version to a
representative sample of the
target population

Technical evaluation and • Establish validity and reliability


establishing norms • Devise norm tables, setting
performance standards or cut-
points

Publishing and ongoing • Compile the test manual


refinement • Submit the measure for
classification
• Publish and market the measure
• Revise, refine, and update
continuously

As you can see from the above, should all these people be included,
the team could be large, which could reduce the efficiency of the test-
development process. Figure 6.1 is an example of a more efficient
approach, in which a core team takes overall responsibility for
directing and managing all aspects of the test-development process.
The core team will normally be decided on first, and then they will
identify the other teams and specialists needed to undertake aspects
of the development process. These specialist teams are activated at
relevant points in the process. For example, the construct domain
team(s) could be consulted to map out the domains and subdomains
that the measure will tap. This team could also be involved in item
development, sometimes in collaboration with other teams (e.g. the
design and technology specialists). Once the experimental version of
the measure has been administered, and item and factor analysis
results are available, the construct domain team can identify
problematic items and revise or replace these items. The core team, in
consultation with the construct domain team, will also sign off on all
the items in the final version of the measure.
Figure 6.1 An efficient approach to manage the relationship between the core and expert
test-development teams

6.2.1.2 Specify the aim, target population, and nature of the measure
The core test-development team develops a test plan that contains
the specifications of the test and will guide its development. As a first
step in developing the plan, the following aspects need to be clearly
stated:
the purpose or aim of the measure
what attribute, characteristic, or construct it will measure
whether the measure is to be used for screening purposes or for in-
depth diagnostic assessment, or for competency-based selection and
training purposes
what types of decisions could be made based on the test scores
for which population (group of people) the measure is intended (e.g.
young children, adults, people with a Grade 12 education). Included in
this should be consideration of whether the measure is to be used in a
multicultural and multilingual context and, if so, which cultural and
language groups it could be used with
whether the measure can be individually administered and/or
administered in a group context
whether the measure is paper-based, computer-based or
performance-based (see also 6.2.1.4)
whether it is a normative measure (where an individual’s performance
is compared to that of an external reference or norm group), an
ipsative measure (where intra-individual as opposed to interindividual
comparisons are made when evaluating performance), or a criterion-
referenced measure (where an individual’s performance is interpreted
with reference to performance standards associated with a clearly
specified content or behavioural domain such as managerial decision-
making or academic literacy).
The box below provides an example of the wording of the aim of a
fictitious psychological measure as it would appear in the test plan
and manual.

Name: South African Developmental Screening Test (SADST).


Target population and age range: Infants and children from birth
to five years and 11 months from all South African cultural groups.
Aim/purpose: To screen the developmental strengths and weaknesses of
infants and young children to plan appropriate enrichment interventions.
Type of measure: Criterion-referenced; screening; employing observation of
children performing tasks, and input from parents to questions recorded on paper.

6.2.1.3 Define the content of the measure


The content of a measure is directly related to the purpose of the
measure. How does a test developer decide on the domains of
content to include in a measure?
Firstly, the construct (content domain) to be tapped by the
measure needs to be operationally defined. For tests used in clinical
settings, this can be done by undertaking a thorough literature study of
the main theoretical viewpoints regarding the construct that is to be
measured and the context in which it will be measured (this is known
as the rational method). In educational settings, learning outcomes in
specific learning areas, subjects or programmes form the basis for
defining the constructs to be tapped. In organisational settings, test
developers base the operational definition of the construct to be
tapped on a job analysis that identifies the competencies (i.e.
knowledge, skills or tasks, and attitudes) needed to perform a job
successfully (McIntire & Miller, 2006). In this regard, competency sets
have been developed for a number of jobs, which could provide a
useful starting point before customising the competencies for a
specific organisational context. For example, the United States
Department of Labor (2017) provides an online resource containing
the competency requirements of over 10 000 work roles, along with
the measures that tap into these.
Secondly, the purpose for which the measure is developed must
be considered. If the measure needs to discriminate between different
groups of individuals, for example to identify high-risk students who
need extra attention, context-sensitive information will have to be
gathered about the aspects of the construct on which these groups
usually differ (this is known as criterion keying). For example, when it
comes to a construct such as academic aptitude, high-risk students
find it very difficult to think critically, while low-risk students have less
difficulty in this regard. Therefore, if a measure aims to identify high-
risk students, items related to critical thinking should be included in it.
Thirdly, Foxcroft (2004c) points out that when a measure is
developed for use with multicultural and multilingual groups, it is
critical that the construct to be tapped is explored in terms of each
cultural and language group’s understanding of it as well as its value
for each group. For example, if a test of emotional intelligence is to be
developed for all cultural and language groups in South Africa, the test
developer needs to explore, through focus groups and individual
interviews, how each group understands the concept of ‘emotional
intelligence’ to see whether there is a shared understanding of the
concept. A further example can be found in the way in which the
South African Personality Inventory (SAPI) was developed (Nel,
Valchev, Rothmann, van de Vijver, Meiring & De Bruin, 2012;
Fetvadjiev, Meiring, Van de Vijver, Nel & Hill, 2015). For the SAPI
project a broad range of ‘everyday’ adults were interviewed across all
the 11 official language groups in South Africa to get information on
the descriptors that they use to describe personality. From this, the
common and unique personality dimensions of the different language
groups were distilled, which guided the development of the SAPI items
(Valchev, Nel, Van de Vijver, Meiring, De Bruin & Rothman, 2013;
Valchev, Van de Vijver, Nel, Rothman & Meiring, 2013).
A test developer rarely uses only one approach when it comes to
operationalising the content domain(s) and subdomains of a measure.
The more rational or analytical approach is often combined with the
empirical criterion-keying approach, while also centring understanding
of the construct within the African context. This ensures that the
resultant measure is theoretically grounded, linked to an important
criterion, and is culturally and contextually sensitive. The more
attention given to comprehensively delineating the construct, the
easier it will be to operationalise it when developing the test items
(Wilson, 2004). Furthermore, when the items are tried out in the
subsequent phases, factor analysis can be used to assist in refining
the underlying dimensions (constructs) being tapped by the measure.
Wilson (2004) recommends that the work done to delineate the
construct or content domain should culminate in a construct map.

For example, a combination of the rational and criterion-keying approaches


together with culturally appropriate information could be used in the development
of the fictitious SADST referred to in 6.2.1.2. A review of the international and
South African literature would be undertaken to identify critical areas/domains of
development that the SADST should tap. The developmental domains and their
subdomains identified using the rational approach could be:
• Communication (receptive and expressive)
• Motor (gross, fine, and perceptual-motor)
• Cognitive (reasoning and problem-solving, attention, and memory)
• Personal-social (interacting with adults and peers, self-concept, independence).

To inform the items that need to be developed in the subdomains, critical


developmental milestones could be identified (i.e. criterion-keying approach). For
example, if child-development theory and research shows that it can be expected
that a 3-year-old child should be able to thread a few beads, a task could be
developed where a child is given a piece of string and four beads and asked to
thread them to assess fine-motor functioning. At the age of 5, more would be
expected, so the test task would then be to ask the child to thread eight beads
on a string and to alternate between red and yellow beads, starting with a red
one. By developing items around critical developmental milestones, children
who successfully complete items in their age range are likely to be on track
developmentally while those who are not successful would be identified as
possibly having a weakness in this aspect of their development. The construct
map for the SADST can be represented in a table or concept map to reflect
the domains, subdomains and critical developmental milestones. Using the
information provided in this block, see if you can develop a table or concept
map of the basic elements that the SADST will cover.

The last step in defining the content of the measure is to specify the
number of items per construct and subconstruct relative to other
biographical variables (e.g. age groups) and/or psychometric
considerations (e.g. range of difficulty levels, cognitive load), for
example.

In the development of the fictitious SADST referred to in 6.2.1.2,


the item specifications for the construct domain of ‘Motor (gross,
fine, and perceptual motor)’ are:

Year Difficulty level Gross Fine Perceptual-motor


1 Easy 2 2 1
Average 2 2 1
Difficult 1 1 1
2 Easy 2 2 2
Average 2 2 2
Difficult 1 1 1
3 Easy 1 2 2
Average 2 2 2
Difficult 2 1 2
4 Easy 1 1 2
Average 2 2 2
Difficult 2 2 2
5 Easy 1 1 1
Average 2 2 2
Difficult 2 2 2

Reflect on why for some of the subconstructs there are differing


numbers of items. What could this be linked to?

6.2.1.4 Other test specifications to include in the plan


The testing mode (paper-, computer- or performance-based) will
guide many of the remaining aspects of the test specifications such as
its format, the nature of the items, the types of responses,
administration and scoring, and so on.
If a computer-based test (CBT) is developed, the test plan needs
to include aspects such as the type of device(s) that will be used for
test-taking purposes, the screen size and clarity, font characteristics,
and line length and spacing (Leeson, 2006). Screen clarity is of
interest as studies have suggested that computer screens take longer
to read than printed materials (Kruk & Muter, 1984). Another aspect
that needs to be addressed in a test plan for a CBT involves the
sourcing or development of the CBT system and software, and the
model adopted for this (Fagbola, Adigun & Oke, 2013).
The format of the test needs to be considered. Test format
consists of two aspects: ‘a stimulus to which a test-taker responds and
a mechanism for response’ (McIntire & Miller, 2000, p. 192). Test
items provide the stimulus and the more common item formats are
the following:
open-ended items, where no limitations are imposed on the response
of the test-taker
forced-choice items, for example, multiple-choice items, where
careful consideration needs to be given to the distracters (alternative
options) used, and true-false items, where the test-taker responds to
each individual item. However, in an alternative forced-choice format,
namely an ipsative format, the test-taker must choose between two or
more attributes for each item (e.g. ‘Do you prefer science or
business?’). In this instance, the relative preference for each attribute
can be determined and rank-ordered for an individual, but the absolute
strength of each attribute for an individual cannot be determined
sentence-completion items
performance-based items such as where apparatus (e.g. blocks)
needs to be manipulated by the test-taker, a scientific experiment
needs to be performed, an essay must be written, an oral presentation
must be prepared, or a work-related competency must be
demonstrated through solving complex realistic problems via
innovative interactive computerised activities (Linn & Gronlund, 2000;
Williamson, Bauer, Steinberg, Mislevy, Behrens & DeMark, 2004).
Such items test problem identification and problem-solving, logical
thinking, organisational ability, and oral or motor performance, for
example. Traditionally, some performance-type items (e.g. an essay)
have been more difficult to score objectively but modern computer
technology has enhanced the objective scoring of computer-based
performance-type tasks
multimedia items are dynamic in nature and integrate combinations
of text, audio, video clips, animations, and so on.

While the choice of item format is linked to what is to be measured,


practical considerations also have a bearing on the choice of format.
For example, if the measure is to be administered in a group context
and large numbers of people are to be evaluated on it, it needs to
be easily scored. This implies that the use of fixed-response items
(e.g. true/false, multiple choice) would be a better choice than an
essay-type or open-ended format for a paper-based test.
When it comes to the method of responding to an item, there are
also various formats such as the following:
objective formats, where there is only one response that is either
correct (e.g. in multiple-choice options and in matching exercises), or
is perceived to provide evidence of a specific construct (e.g. in true-
false options, or the value assigned to a statement along a rating
scale).
subjective formats, where the test-taker responds to a question
verbally (e.g. in an interview) or in writing (e.g. to an open-ended or
essay-type question) and the interpretation of the response as
providing evidence of the construct depends on the judgement of the
assessment practitioner (McIntire & Miller, 2006). Projective tests,
such as the Rorschach Inkblot Test or the Thematic Apperception
Test (see Chapter 12), are examples of a subjective answer format. In
these tests, ambiguous pictures are presented (stimulus) to which
test-takers must respond by either describing what they see or by
telling a story.

Linked to the response to an item stimulus is if a time limit is


imposed. While all speed tests are timed, many cognitive tasks, and
especially performance-type items as well as measures administered
in group contexts, also have time limits. If a test is to be timed, this
will place restrictions on the number of items that can be included.
However, it is interesting to note that Kennet-Cohen, Bronner, and
Cohen (2003), working in a multicultural context, found that for the
Psychometric Entrance Test used in Israeli universities, setting an
overall time limit for the test battery of 150 minutes was more effective
than setting time limits for the individual subtests, and that predictive
validity increased by 20 per cent in this instance. The Centre for
Access Assessment and Research at the Nelson Mandela University
has found that the assessment of university applicants in a
multicultural group context is enhanced if all time limits are removed
as this provides a more accurate picture of the proficiencies of
applicants (Foxcroft, 2008).
Bias can unintentionally be introduced through either the item
stimulus or the mechanism of response. It is critically important to pay
attention to the potential for method and response bias, especially
when the measure is developed for use with multicultural test-takers
(Foxcroft, 2004c; Hambleton, 1994, 2001). For example, Owen
(1989a) found that black and white South African children performed
differently on story-type items. Such items should thus be avoided if
the aim is to develop a test for use with black and white children.
When she developed the LPCAT, De Beer (2005) developed only
non-figural verbal items (figure series, figure analogies, and pattern
completion) as such item types are ‘recommended for the fair
assessment of general reasoning ability in multicultural contexts’ (p.
723). As regards response bias, this aspect is dealt with in Chapter 17
when we discuss the impact of computer familiarity on computer-
based test performance.
Other than method and response bias, test-takers are a potential
source of bias themselves as they might respond by using a specific
style or response set/bias (e.g. by agreeing with all the statements)
that might result in false or misleading information. Response set/bias
is discussed more fully in Chapters 4, 12, and 17. However, test
developers should try to minimise the possibility of response sets
influencing the test results through the way that they construct the test
items.
Another potential source of bias relates to the language in which
the measure is developed and administered. If the measure is
intended to be used in a multilingual context, then only having the
measure available in English, for example, could be a potential source
of bias for test-takers who are second-language speakers. In this
instance, it would be better to have multiple language versions of the
measure. How to deal with issues related to language in test
development and administration will be discussed in Chapters 7 and
9.
The eventual length of the measure also needs to be considered
at this stage. The length is partly influenced by the amount of time that
will be available to administer the measure. The length also depends
on the purpose of the measure. A measure that is developed to tap a
variety of dimensions of a construct should have more items than a
measure that focuses on only one dimension of a construct, or else
the reliability of the measure could be problematic.
Having considered all the aspects related to the dimensions of the
construct to be tapped, the item and response format, whether time
limits will be set for the responses, test length, and the number of
items, the test developer now has a clear conceptualisation of the
specifications for the measure. This is formalised in the test plan
specifications, which specify the content domains to be included and
the number of items that will be included in each. For example, part of
a test plan for an emotional intelligence test could indicate that four
aspects of the construct will be covered, namely, interpersonal
relations, intrapersonal functioning, emotional well-being, and stress
management, and that there will be 10 items per aspect.

» CRITICAL THINKING CHALLENGE 6.1

Develop key aspects of the test plan for the SADST that was outlined in
6.2.1.2 and 6.2.1.3. Expand the table/concept map that you developed to
map the construct domains and subdomains and add the number of
items for each subdomain, remembering that the SADST covers the age
range from birth to 5 years. Also think about the format of the items and
the nature of the responses that will be required.

ADVANCED READING

Interdisciplinary collaboration across fields such as cognitive and educational


psychology, psychological and educational measurement, computer science
and technology, and statistics has had an innovative impact on the design
and development of psychological and educational measures, and the
emergence of the field of assessment engineering (AE). ‘By using construct
maps, evidence models, task models and templates, assessment engineering
(AE) makes it possible to generate extremely large numbers of test forms with
prescribed psychometric characteristics (e.g. targeted measurement
precision)’ (Luecht, 2009, p. 1). To read more about this innovative
methodology, consult Luecht (2008, 2009); Gierl, Zhou and Alves (2008); and
Zhou (2009). You can find the Web addresses to access these readings
online in the References section at the end of this book.

6.2.2 Item development, creation or sourcing


6.2.2.1 Develop the items
Usually a team of experts writes, develops or sources the items. The
purpose of the measure and its specifications provide the constant
guiding light for the item developers. A variety of sources can be
consulted for ideas for items: existing measures, theories, textbooks,
curricula, self-descriptions, anecdotal evidence in client files, open
source item banks, and so on.
Depending on the type of measure being developed, a few
important pointers for writing items that must be read are the following:
Wording must be clear and concise. Clumsy wording and long,
complex sentences could make it difficult for test-takers to understand
what is required of them.
Use vocabulary that is appropriate for the target audience.
Avoid using negative expressions such as ‘not’ or ‘never’, and double
negatives in particular.
Cover only one central theme in an item.
For reading comprehension and case-study tasks, keep in mind that
the ‘millennials’, or ‘Google generation’, spend little time reading and
their reading behaviour is different. Reading is about ‘skimming,
dipping in and out of texts’ (Zinn & Langdown, 2011, p. 109). This
suggests that the length of the text used as well as the positioning of
text on the screen or page requires careful consideration.
Avoid ambiguous items.
Vary the positioning of the correct answer in multiple-choice
measures.
All distracters for multiple-choice response alternatives should be
plausible (i.e. the distracters should be as attractive as the correct
answer). As far as the number of alternative response choices is
concerned, Linn and Gronlund (2000) suggest that it is best to provide
four answer choices. This implies that three good distracters need to
be developed, which they feel can be reasonably achieved. However,
for children, they suggest that three alternatives are used to reduce
the amount of reading required.
True and false statements should be approximately the same length
and the number of true statements should be approximately equal to
the number of false statements.
• The nature of the content covered should be relevant to the
purpose of the measure (e.g. you would not expect a personality
measure to contain an item asking what the capital of Kenya is).

When it comes to writing items for children’s measures, the stimulus


material should be as colourful and attractive as possible. Varied
types of tasks should be included, and where stimulus materials must
be manipulated, the child should be able to manipulate them without
difficulty (e.g. blocks and form-board shapes for young children
should be large).
Within the African context, choosing appropriate stimuli and
familiar objects is or critical Importance for the assessment measure
to be culturally sensitive and relevant. An example of an African-
centred way to assess non-verbal cognitive reasoning is provided in
the box below.

AFRICAN-CENTRED NON-VERBAL COGNITIVE REASONING STIMULI

Zuilkowski et al. (2016, p. 341) argue that while Westernised children ‘are exposed
to extensive two-dimensional materials during early childhood, such as picture
books and photographs, most rural African children are not’. Most Western-oriented
tasks of non-verbal reasoning involve books of line drawings and visual patterns that
need to be completed, for example. Such two-dimensional stimuli could be foreign to
(rural) African children and result in the assessment results being biased. To create
more appropriate stimuli to tap non-verbal rea-soning in Zambian children,
Zuilkowski et al. (2016) developed the Object-based Pattern Reasoning Assessment
(OPRA). Instead of drawings, OPRA items use familiar objects such as beans,
stones, and bottle caps. The assessment practitioner uses a sheet of paper or
cardboard with a grid of six squares on, and puts five red stones in five squares of
the grid, leaving one square with no stone in. The child is then given a choice of four
objects, one of which will be a red stone. ‘The child can answer by pointing to their
chosen item, naming their selection verbally, or by placing the item in the correct cell
in the grid’ (p. 346). When comparing the performance of children on two-
dimensional stimuli, performance on the OPRA was found to be more valid. The
work done to develop the culturally sensitive stimuli for OPRA reminds us of the
importance of investing time and creativity into
developing tests containing indigenous stimuli and tasks.
Source: Zuilkowski, S.S., McCoy, D.C., Serpell, R., Matafwali, B., and Fink,
G. 2016. Dimensionality and the development of cognitive assessments for
children in sub-Saharan Africa. Journal of Cross-Cultural Psychology,
47(3), pp. 341-354. DOI: 10.1177/0022022115624155

It is beyond the scope of this chapter to cover the development of


digitalised items for use in CBT and e-assessment. The block below
provides readings on this topic.

ADVANCED READING
Drozdick, L.W., Getz, K., Raiford, S. E., & Zhang, O. 2016. WPPSI-
IV: Equivalence of Q-interactive and paper formats. Q-
interactive Technical Report 14. Bloomington, MN: Pearson.
Retrieved 3 May 2018 from
http://www.helloq.com/content/dam/ped/ani/us/helloq/media/WPP
Qi-Tech-Report-14-FNL.pdf
Gelderblom, J.H. 2008. Designing technology for young children:
Guidelines grounded in a literature investigation on child
development and children’s technology. Unpublished doctoral
dissertation, University of South Africa.
Pade, H. 2014. The evolution of psychological testing: Embarking on
the age of digital assessment. In R. Valle & J. Klimo (Eds.), The
changing faces of therapy: Innovative perspectives on clinical
issues and assessment. Retrieved 3 May 2018 from
https://www.helloq.com/content/dam/ped/ani/us/helloq/media/The
Parshall, C. G., Harmes, J.C., Davey, T. & Pashley, P. 2010.
Innovative items for computerised testing. In W.J. van der
Linden & C.A. W. (Eds.), Computerised adaptive testing: Theory
and practice (2nd ed). Norwell, MA: Kluwer Academic
Publishers.
Scalise, K. & Gifford, B. 2006. Computer-based assessment in e-
learning: A framework for constructing ‘intermediate constraint’
questions and tasks for technology platforms. Journal of
Technology, Learning, and Assessment, 4(6). Retrieved 21
January 2018 from https://files.eric.ed.gov/fulltext/EJ843857.pdf

As some of the items will be discarded during the test-development,


item-analysis, and refinement phases, provision needs to be made for
including many more items in the experimental version than will be
required in the final version. As a general rule of thumb, test
developers usually develop double the number of items than they
require in their final item pool (Owen & Taljaard, 1996). At least one
third of the items are usually discarded when the item analysis is
performed.
6.2.2.2 Review the items
After a pool of items has been developed, it should be submitted to a
panel of experts for review and evaluation. This panel could be
comprised of construct or content domain experts, cultural and
language experts, and experts in the context where the measure will
be used. For example, if the measure is being used to assess
emotional intelligence in the workplace, including industrial and
organisational psychologists and HR development practitioners could
strengthen the review panel. The expert reviewers will be asked to
judge whether the items sufficiently tap the content domain or
dimensions of the construct being assessed. They will also be asked
to comment on the cultural, linguistic, and gender appropriateness of
the items. The wording of the items, and the nature of the stimulus
materials, will also be closely scrutinised by the panel of experts.
The pool of items could also be administered to a small number of
individuals from the intended target population to obtain qualitative
information regarding items and test instructions that they have
difficulty understanding.
Based on the review panel’s recommendations as well as data
gathered from trying out the items, certain items in the pool may have
to be revised, or even rewritten.
6.2.3 Assemble and pre-test the experimental version
of the measure
In getting the measure ready for its first experimental administration,
several practical considerations require attention.
6.2.3.1 Arrange the items
The items need to be arranged or sequenced in a logical way in terms
of the construct being measured. Furthermore, for paper-based tests,
items need to be grouped or arranged on the appropriate pages in the
test booklet.
6.2.3.2 Finalise the length
Now that the experimental item pool is available, the length of the
measure needs to be revisited. Although sufficient items have to be
included to sample the construct being measured, the time test-takers
will need to read items also has to be considered. The more test-
takers have to read, the longer it will take them to complete the
measure. Thus, if the measure has time constraints, and there is quite
a bit to read, either some items will have to be discarded to allow the
measure to be completed in a reasonable time period, or the amount
that has to be read will have to be adjusted.
6.2.3.3 Answer mechanisms and protocols
For paper-based tests, decisions need to be made as to whether
items will be completed in the test booklet, or whether a separate
answer sheet (protocol) needs to be developed. Care should be taken
to design the answer protocol in such a way that it aids the scoring of
the measure, and that it is easy to reproduce.
For performance-based items and tests as well as individually
administered tests, a protocol needs to be developed for the
assessment practitioner to record answers, time taken, whether a task
was completed correctly or not, and so on.
For computer-based tests it needs to be clear how to answer the
question (e.g. click on or touch the correct response) or to complete a
task that has multiple steps (e.g. drag and drop numbers to sequence
them from smallest to largest). When the item or task has been
completed, it should also be clear to the test-taker how to move to the
next screen (e.g. touch the arrow at the bottom of a screen). For this
reason, it is good practice to routinely include practice items so that
the test-taker can become familiar with responding to examples of test
items and moving from one item to the next before the actual test
starts.
6.2.3.4 Develop administration instructions
Care needs to be taken in developing clear, unambiguous
administration instructions for the experimental try-out of the items. It
is usually advisable to pre-test the instructions on a sample of people
from the target population. Furthermore, the assessment practitioners
who are going to administer the experimental version of the measure
need to be comprehensively trained. If care is not taken during this
step of the test-development process, it could have negative
consequences for performance on the items during the experimental
pre-testing stage. For example, badly worded administration
instructions, rather than poorly constructed items, can cause poor
performance on the item. Clearly worded instructions are of critical
importance when a computerised test is developed as the
administration is fully automated and there may not be an assessment
practitioner on hand if the test-taker has queries.
6.2.3.5 Pre-test the experimental version of the measure
The measure should now be administered to a large sample
(approximately 400 to 500) from the target population. Besides the
quantitative information regarding performance on each item gathered
during this step, those assisting with the administration should be
encouraged to gather qualitative information. Information about which
items test-takers generally seemed to find difficult or did not
understand, for example, could be invaluable during the item
refinement and final item-selection phase. Furthermore, information
about how test-takers responded to the stimulus materials, the
ordering or sequencing of the items, and the length of the measure
could also prove to be very useful for the test developer.
6.2.4 The item-analysis phase
The item-analysis phase adds value to the item development and the
development of the measure in general. Simply put, the purpose of
item analysis is to examine each item to see whether it serves the
purpose for which it was designed. Item analysis helps us to
determine how difficult an item is, whether it discriminates between
good and poor performers, whether it is biased against certain groups,
and what the shortcomings of an item are. Certain statistics are
computed to evaluate the characteristics of each item. Two statistical
approaches can be followed in that the characteristics of items can be
analysed using Classical Test Theory or Item Response Theory (IRT)
or both (e.g. De Beer, 2005). The resultant statistics are then used to
guide the final item selection and the organisation of the items in the
measure.
6.2.4.1 Classical Test-Theory item analysis: Determine item difficulty (p)
The difficulty of an item (p) is the proportion or percentage of
individuals who answer the item correctly. That is,

The higher the percentage of correct responses, the easier the item;
the smaller the percentage of correct responses, the more difficult the
item. However, p is a behavioural measure related to the frequency of
choosing the correct response, and therefore tells us nothing about
the intrinsic characteristics of the item. Furthermore, the difficulty
value is closely related to the specific sample of the population to
which it was administered. A different sample might yield a different
difficulty value. Nonetheless, one of the most useful aspects of the p-
value is that it provides a uniform measure of the difficulty of a test
item across different domains or dimensions of a measure. It is for this
reason that difficulty values are often used when selecting final items
for a measure.
6.2.4.2 Classical Test-Theory item analysis: Determine
discriminating power One of the purposes of item analysis is to
discover which items best measure the construct or content domain
that the measure aims to assess. Good items consistently measure
the same aspect that the total test is measuring. Consequently, one
would expect individuals who do well in the measure as a whole to
answer a good item correctly, while those who do poorly on the
measure as a whole would answer a good item incorrectly. The
discriminating power of an item can be determined by means of the
discrimination index and item-total correlations.
To compute the discrimination index (D), the method of extreme
groups is used. Performance on an item is compared between the
upper 25 per cent and the lower or bottom 25 per cent of the sample.
If the item is a good discriminator, more people in the upper group will
answer the item correctly.
To compute the discrimination index, the upper and lower 25 per
cent of test-takers need to be identified. Next, the percentage of test-
takers in the upper and lower groups who passed the item is
computed, and the information is substituted into the following
formula:

formula 6.1
where:
U = number of people in the upper group who passed the item
nu= number of people in the upper group
L = number of people in the lower group who passed the item
nl = number of people in the lower group
A positive D-value indicates an item that discriminates between the
extreme groups, while a negative D-value indicates an item with
poor discriminatory power.
An item-total correlation can be performed between the score on
an item and performance on the total measure. A positive item-total
correlation indicates that the item discriminates between those who do
well and poorly on the measure. An item-total correlation close to zero
indicates that the item does not discriminate between high and low
total scores. A negative item-total correlation is indicative of an item
with poor discriminatory power. In general, correlations of .20 are
considered to be the minimum acceptable discrimination value to use
when it comes to item selection.
The advantage that using item-total correlations brings with it is
that correlation coefficients are more familiar to people than the
discrimination index. Furthermore, the magnitude of item-total
correlations can be evaluated for practical significance (e.g. .4 =
moderate correlation) and can be expressed in terms of the
percentage of variance of the total score that an item accounts for
(e.g. if r = .40 the item accounts for 16 per cent of the total variance).
There is a direct relationship between item difficulty and item
discrimination. The difficulty level of an item restricts the discriminatory
power of an item. Items with p-values around .5 have the best
potential to be good discriminators.
6.2.4.3 Item Response Theory (IRT)
By using Item Response Theory, the difficulty level and discriminatory
power of an item can be even more accurately determined. In IRT the
item parameters are sample invariant. That is, the parameters are not
dependent on the ability level of the test-takers responding to the item
and are thus ‘a property of the item and not of the group that
responded to it’ (De Beer, 2005, p. 723).
In IRT an item-response curve (see Figure 7.1 in Chapter 7) is
constructed by plotting the proportion of test-takers who gave the
correct response against estimates of their true standing on a latent
trait (e.g. ability) (Aiken & Groth-Marnat, 2005). The estimates are
derived through logistic equations, with the specific equations used
varying according to the estimation procedures used. In the one-
parameter or Rasch model, an item-response curve is constructed by
estimating only the item difficulty parameter (b). In the two-parameter
model, an item-response curve is constructed by estimating both
difficulty (b) and discrimination (a) parameters. This is the most
popular model, especially since it makes it possible to compare the
difficulty level and discriminating power of two items, for example
(Aiken & Groth-Marnat, 2005). The three-parameter model allows for
variations among the items in terms of difficulty level (b),
discrimination (a), and for guessing (c) on multiple-choice items (De
Beer, 2005).
Especially when you are developing a measure in a multicultural,
multilingual country such as South Africa, it is useful to investigate
item bias in the early stages of developing a measure. IRT also
makes it possible to explore differential item functioning (DIF) in a
much more sophisticated way to identify items that may be biased or
unfair for test-takers from certain groups. One of the common
procedures for detecting DIF is to calculate the area between the item
characteristic curves for two groups (e.g. males and females, or
isiXhosa and isiZulu speakers). The larger the area, the more likely it
is that the item shows DIF. However, as De Beer (2005) points out, it
remains a subjective decision regarding how large the area should be
for DIF to be evident. Usually a combination of visual inspection and
empirically estimated cut-off points is used to flag items that exhibit
DIF. Kanjee (2007) also points out that DIF procedures are only
applicable when two groups are compared. He thus suggests that
logistic regression can be used to simultaneously explore bias when
multiple groups need to be compared. If you are interested in reading
more about how IRT is used for item-analysis purposes during test
development, De Beer (2005) provides an example of how the items
of the LPCAT were analysed and refined using IRT. DIF will also be
discussed in more detail in Chapter 7.
6.2.4.4 Identify items for final pool
Classical test theory, IRT and DIF analyses can be used to determine
item parameters based on which items should be included in the final
version of the measure, or which should be discarded. Available
literature should be consulted to set item-parameter values that must
be met for inclusion or exclusion. For example, if the three-parameter
IRT model is used, a-values should fall between 0.8 and 2.0 while c-
values should fall between 0.0 and 0.3 for an item to be included
(Hambleton & Swaminathan, 1985).
A word of caution, however, is necessary. Item analysis is a
wonderful aid that the test developer can use to identify good and bad
items and to decide on the final item pool. Performance on an item is,
however, impacted on by many variables (e.g. the position of an item
or the administration instructions). Consequently, item-analysis
statistics should not necessarily replace the skill and expertise of the
test developer. At times, the item-characteristic statistics may be poor,
but the test developer could motivate the inclusion of the item in the
final pool on logical or theoretical grounds. In a similar vein, the item
statistics may be very impressive, but on closer inspection of the item,
the test developer may feel that there is no logical or theoretical
reason for retaining the item.

6.2.5 Revise and standardise the final version of


the measure
Having gathered quantitative and qualitative information on the items
and format of the experimental version of the test, the next phase
focuses on revising the items and test, and then administering the final
version of the test to a large sample for standardisation purposes.
6.2.5.1 Revise the items and test
Items identified as being problematic during the item-analysis phase
need to be considered and a decision needs to be made for each one
regarding whether it should be discarded or revised. Where items are
revised, the same qualitative review process by a panel of experts, as
well as experimental try-outs of the item, should be done.
6.2.5.2 Select items for the final version
The test developer now has a pool of items that has been reviewed by
experts and on which empirical information regarding item difficulty,
discrimination, and bias has been obtained. Based on this information,
the selection of items for the final measure takes place. After the
selection of items, the existing database can be used to check on the
reliability and validity coefficients of the final measure, to see whether
they appear to be acceptable.
6.2.5.3 Refine administration instructions and scoring procedures
Based on the experience and feedback gained during the pre-testing
phase, the administration and scoring instructions might need to be
modified.
6.2.5.4 Administer the final version
The final version is now administered to a large, representative
sample of individuals for the purposes of establishing the
psychometric properties (validity and reliability), and norms.

6.2.6 Technical evaluation and establishing norms


6.2.6.1 Establish validity and reliability
The psychometric properties of the measure need to be established.
Various types of validity and reliability coefficients can be computed,
depending on the nature and purpose of the measure. We dealt with
the different types of validity and reliability and how they are
established in detail in Chapters 4 and 5. It also needs to be
empirically established whether or not there is construct and
measurement/scalar equivalence across subgroups in the target
population. For further information in this regard, consult research
conducted on the Myers-Briggs Type Indicator (Van Zyl and Taylor,
2012) and the PiB/SpEEx Observance Test (Schaap, 2011).
6.2.6.2 Establish norms, set performance standards or cut-scores
If a norm-referenced measure is developed, appropriate norms need
to be established. This represents the final step in standardising the
measure. An individual’s test score has little meaning on its own.
However, by comparing it to that of a similar group of people (the
norm group), the individual’s score can be more meaningfully
interpreted. Common types of norm scales were discussed in Chapter
3. Furthermore, in Chapter 14 (Section 14.2) it is indicated that as
IRT-based and Rasch-scaled tests are ‘distribution-free’, norms are
not needed to interpret test scores, although assessment practitioners
require intensive training in such interpretation.
If a criterion-referenced measure is used, cut-scores or
performance standards need to be set to interpret test performance
and guide decision-making. We discussed cut-scores and
performance standards in Chapter 3.

6.2.7 Publish and refine continuously


Before a measure is published, a test manual must be compiled and
then submitted for classification. Once a measure has been published
and ongoing research into its efficacy and psychometric properties is
conducted, revised editions of the manual should be published to
provide updated information to assessment practitioners.
6.2.7.1 Compile the test manual
The test manual should:
Specify the purpose of the measure.
Indicate to whom the measure can be administered.
Provide practical information such as how long it takes to administer
the measure, whether a certain reading or grade level is required of
the test-taker, whether the assessment practitioner requires training in
its administration, and where such training can be obtained.
Specify the administration and scoring instructions.
Outline, in detail, the test-development process followed.
Provide detailed information on the types of validity and reliability
information established, how this was done, and what the findings are.
Provide information about the cultural appropriateness of the
measure, and the extent to which test and item bias has been
investigated.
Provide detailed information about when and how norms were
established and norm groups were selected. A detailed description of
the normative sample’s characteristics must be provided (e.g. gender,
age, cultural background, educational background, socio-economic
status, and geographic location).
Where appropriate, provide information about how local norms and
cut-off scores could be established or augmented by assessment
practitioners to enhance the criterion-related or predictive validity of
the measure.
Indicate how performance on the measure should be interpreted.
6.2.7.2 Submit the measure for classification
As the use of psychological measures is restricted to the psychology
profession, it is important to submit the measure to the Psychometrics
Committee of the Professional Board for Psychology so that it can be
determined whether the measure should be classified as a
psychological measure or not. In Chapter 8 we will elaborate on the
reasons for classifying a measure and the process to be followed.
6.2.7.3 Publish and market the measure
Finally, the measure is ready to publish and market. In all marketing
material, test developers and publishers should take care not to
misrepresent any information or to make claims that cannot be
substantiated. For example, if the norms for a measure have not been
developed for all cultural groups in South Africa, the promotional
material should not claim that the measure can be administered to all
South Africans. A clear distinction should also be made between
marketing and promotional material and the test manual – they are not
one and the same thing. As outlined in Section 6.2.7.1, the test
manual must contain factual information about the measure, rather
than act as a selling device that tries to put the measure in a
favourable light.
As the use and purchase of psychological measures is restricted to
the psychology profession in South Africa (see Chapter 8), test
developers and publishers need to market the measures to the
appropriate target market. Furthermore, promotional material should
not provide examples of actual test items or content, as this could
invalidate their use if this information were to be released in the
popular media.
Increasingly, test developers and publishers are beginning to set
standards for those who purchase and use their measures. Such
standards could include insisting that assessment practitioners
undergo special training in the use of the measure, as well as follow-
up activities whereby test developers and publishers check that the
assessment practitioners are using and interpreting the measure
correctly, and fairly.
6.2.7.4 Revise and refine continuously
There is no rule of thumb regarding how soon a measure should be
revised. It largely depends on the content of the measure. When item
content dates quickly, more frequent revisions may be necessary. A
further factor that influences the timing of revisions is the popularity of
a measure. The more popular a measure, the more frequently it is
researched. From the resultant database, valuable information can be
obtained regarding possible refinements, alterations in administration
and scoring procedures, and so on. Test developers usually wait until
a substantial amount of information has been gathered over a five- to
10-year period regarding how the measure needs to be revised before
they undertake a full-scale revision and renorming process.

» CRITICAL THINKING CHALLENGE 6.2

Construct a flow diagram of the steps involved in developing and


publishing a measure.

» ETHICAL DILEMMA CASE STUDY 6.1

The test agency that is developing the South African Developmental


Screening Test (SADST) (see section 6.2.1.2 and 6.2.1.3) approach
Jeremy, an industrial psychologist who specialises in adult development,
to head up the three-person test-development team consisting of a
remedial teacher, occupational therapist and a speech therapist.

Questions
(a) Why do you think that the test agency approached Jeremy? [Hint: consult
the Health Professions Act and the Ethical Rules for psychologists
regarding the nature and development of psychological tests.]
(b) How should Jeremy respond?
6.3 Evaluating a measure
Test developers and publishers have the responsibility of ensuring
that measures are developed following a rigorous methodology, and
of providing information to verify the measure’s validity and reliability.
Assessment practitioners have the responsibility of evaluating the
information provided about a measure and weighing up whether the
measure is valid and reliable for the intended purpose. Assessment
practitioners can use the information provided in Section 6.2 on the
development of a measure to identify the key technical and
psychometric aspects that they should keep in mind when they
evaluate a measure. When evaluating a measure, assessment
practitioners should, among other things, consider the following:
How long ago a measure was developed, as the appropriateness of
the item content and the norms changes over time
The quality and appeal of the test materials. For computerised tests,
an evaluation of the quality of the test materials includes looking at
how easy it is to make a response; the clarity of graphical and digital
content; the quality of design of the user interface and what the
usability studies indicate; and how robust the design of the software is
(e.g. are responses stored on an ongoing basis so that if the power
fails or Internet connection is lost the test-taker restarts the test at the
point that they were at when the problem occurred?)
The quality of the manual contents (e.g. does it indicate the purpose of
the measure, to whom it should be applied, what the limitations are,
how to administer, score, and interpret the measure)
The clarity of the test instructions (especially where there are different
language versions or it is a computerised test that will be administered
in an automated way)
The cultural appropriateness of the item content and the constructs
being tapped (e.g. has there been an empirical investigation into item
bias, differential item functioning, and construct equivalence for
different groups?)
If the measure is to be used in a multi-cultural and/or multilingual
context, whether bias investigations provide evidence that the
performance of various groups on the measure is equivalent
The adequacy of the psychometric properties that have been
established
• The nature of the norm groups and the recency of the norms.

Assessment practitioners should use available information, both local


and international, to develop their own rules of thumb when it comes to
assessing the quality of assessment measures. Consult the Web site of
the European Federation of Psychologists’ Associations
(www.efpa.eu), or that of the British Psychological Society’s
Psychological Testing Centre (PTC) (www.psychtesting.org.uk) for
information on a detailed test-review system.

» CRITICAL THINKING CHALLENGE 6.3

Develop your own set of guidelines to use when evaluating psychological


assessment measures. Try to develop your guidelines in the form of a
checklist, as this will be more practical and save you valuable time in practice.

CHECK YOUR PROGRESS 6.1


6.1 In educational assessment, test plans are based on the content of a
chapter or module. Develop a test plan for a multiple-choice measure
that covers the content of this chapter. Your test plan should include:
• Definitions of the constructs (learning outcomes) to be measured
• The content to be covered
• The number of questions per learning outcome
• The item and response format
• How the test will be administered and scored.

6.4 Conclusion
In this chapter you have learned about the steps involved in
developing a psychological test, and the psychometric properties and
item characteristics that need to be investigated in the process. The
chapter concluded with a discussion on how to evaluate a measure.
How’s the journey going? There’s only one more stop left in the
Foundation Zone. In the last stretch of uphill climbing for a while, you
will be introduced to issues related to the development and adaptation
of measures in cross-cultural contexts.
Cross-cultural test
adaptation, translation, and
tests in multiple languages
FRANCOIS DE KOCK AND CHERYL FOXCROFT
Chapter CHAPTER OUTCOMES
7 By the end of this chapter you will be able to:
❱ list reasons why assessment measures are adapted
❱ describe what test adaptation is and the different types of adaptation
❱ explain the concepts of bias and equivalence, which are fundamental to
understanding the different designs and methods used to adapt measures
❱ describe statistical methods for detecting bias and equivalence
❱ discuss challenges encountered when adapting measures in
South Africa and suggest possible solutions
❱ describe the simultaneous multicultural test-development process.

7.1 Introduction
As you saw in Section 2.4 in Chapter 2, the history of psychological
test development in South Africa was significantly shaped by our
racially segregated society that resulted from apartheid times as well
as earlier colonial influences. This resulted in measures that have
been developed generally not being available in the range of
languages spoken by South African test-takers, or not being
sufficiently culturally sensitive. In addition, not only are measures
largely Western-oriented rather than African-centred in the nature of
the constructs they assess, many of the measures that were
developed in Europe or the United States have been uncritically
adopted for use here. The challenge of providing linguistically and
culturally relevant measures in a multicultural and multilingual society
is not limited to South Africa. With increased globalisation, it is an
international challenge. As was discussed in Section 2.3.5 in Chapter
2, there are two ways that this challenge can be addressed. On the
one hand, tests developed in one culture can be adapted for use for
test-takers from other cultural and language groups. On the other
hand, many non-Western countries are developing their own
indigenous measures in their cultural contexts. This chapter will
largely focus on test adaptation, with a short section on multicultural
test development. The reason for this focus is that the adaptation of
assessment measures is essential in South Africa if test results are to
be valid and reliable for test-takers from diverse backgrounds.
Adapting measures is certainly a daunting and challenging task
that is not only costly and time-consuming, but also requires high
levels of expertise. In this chapter, we will discuss the reason for
adapting measures, and the aspects to consider when doing so, as
well as how to investigate test equivalence and bias. The challenges
in adapting measures for the multicultural and multilingual South
African context will further be highlighted and some solutions will be
proposed. In addition, the possibility of developing indigenous
multiple-language versions of a measure, using a simultaneous
multicultural approach, will be explored.

7.2 Reasons for adapting measures


Some of the most important reasons for adapting assessment
measures include the following:
To enhance fairness in assessment by allowing test-takers to be
assessed in the language of their choice, thus possible bias
associated with assessing individuals in their second or third best
language is minimised and the validity of results is increased. This is
important, legally speaking, given that it is unlawful (in South Africa) to
use tests where no scientific evidence exists that such measures ‘can
be applied fairly to all employees; and is not biased against any
employee or group’ (Employment Equity Act, No. 55 of 1998).
To reduce costs and save time. Instead of ‘reinventing the wheel’, it
is often cheaper and easier to translate and adapt an existing
measure than to develop a new measure. In addition, in situations
where there is a lack of resources and technical expertise, adapting
measures might be the only alternative for ensuring a valid and
reliable measure.
• To facilitate comparative studies between different language and
cultural groups, both at an international and national level. This has
become especially relevant in recent years with the growing contact
and cooperation between different nations in economic, educational,
and cultural spheres. This has resulted in an increased
need for many nationalities and groups to know and learn more
from and about each other, and for multinational organisations to
assess people from diverse backgrounds for selection and training
purposes using adapted versions of the same measures.

With increased diversification of cultures in countries and the


globalisation of world trade, there has been a marked increase in cross-
cultural assessment and test adaptation (Casillas & Robbins, 2005;
Hambleton & Zenisky, 2010). In fact, cross-cultural comparative
research studies have increased in the first decade of the twenty-first
century and this increase is expected to continue (Byrne, Oakland,
Leong, Van de Vijver, Hambleton, Cheung & Bartram, 2009).

7.3 Test adaptation


Test adaptation involves making changes to a measure developed in
one context to enhance its ‘cultural and linguistic adequacy’ (Van de
Vijver, 2016, p. 364) when it is used in another context.

For an adapted measure to be used with confidence, there needs to


be evidence that the adapted version is equivalent to the original
(source) version, especially if the performance of individuals from
diverse groups is to be compared using the different adapted versions.
This section will focus on aspects of a measure that require
adaptation, and potential challenges encountered. Sections 7.4 and
7.5 will focus on bias and equivalence.

7.3.1 Types of test adaptation


Sometimes the only adaptation that is needed is that the test needs to
be translated from one language to one or more other languages (e.g.
from English to Setswana and isiXhosa), while still retaining the
original meaning. However, in some instances, just translating a
measure may be insufficient as some words, context, examples, and
so on need to be changed to be more relevant and applicable to a
specific country, language, and/or cultural group. Whether the
construct is theoretically conceptualised in the same way in the source
(original) cultural context and target cultural context is also something
that must be considered when adapting a measure.
Van de Vijver (2016, p. 370) identifies eight kinds of test
adaptations, namely: ‘concept-driven adaptation’; ‘theory-driven
adaptation’; ‘terminological/factual-driven adaptation’; ‘norm-driven
adaptation’; ‘linguistics-driven adaptation’; ‘pragmatics-driven
adaptation’; ‘familiarity/recognisability-driven adaptation’; and ‘format-
driven adaptation’. Table 7.1 provides some information on these
various kinds of adaptations.
Table 7.1 Kinds of test adaptations

KIND OF ADAPTATION DESCRIPTION

Concept-driven Changes to item content


because of differences in
culture-specific concepts.
For instance, for a general-
knowledge item that asks
how many players there are
in an ice-hockey team, ‘ice
hockey’ would be changed
to a sport that is more
familiar to all South Africans,
such as soccer.

Theory-driven The theory on which an


item/measure is based
needs to be consulted when
translating the measure into
a different language. For
instance, if digits are to be
memorised, the fewer the
number of syllables needed
to pronounce a digit, the
more working memory
space is available to store
the digits. If the digits are
directly translated from the
source to the target
language, this could result
in digits that are
pronounced using a greater
number of syllables in the
target-language version,
which would make the task
more difficult (Van de Vijver,
2016).

Terminological/factual-driven Specific terms or


characteristics that differ
from country to country
need to be accommodated.
For instance, in a South
African adaptation of a test
developed in the UK,
references to the British
pound would be changed to
the South African Rand.

Norm-driven Local cultural norms, values,


and practices need to be
accommodated. For
instance, cultures differ
widely regarding whether
eye contact can be made
with an elder. This has
implications for test
administration, and also for
the design of visual stimuli.
Linguistics-driven Structural differences
between languages need to
be accommodated. For
instance, some languages
have gender-specific nouns,
while others do not.

Pragmatics-driven Involves adaptations to


accommodate culture-
specific conventions. For
instance, direct translations
could result in the test
instructions perceived to be
too direct or impolite. An
adaptation would be
needed for the wording to
be culturally sensitive.

Familiarity/recognisability-driven Cultures differ in their


familiarity with assessment
procedures, and this needs
to be accommodated. For
instance, when tapping non-
verbal reasoning in rural
African children, two-
dimensional drawings of
visual patterns are
unfamiliar, while patterns
formed with objects are
more familiar (see Section
6.2.2.1).

Format-driven Appropriateness of item


formats or response types
differs across cultures. For
instance, many measures of
intellectual ability impose
time limits. In some cultures
it is commonly accepted
that the brighter students
are the ones who complete
tasks first, while in other
cultures, intellectual ability
is associated with
thoughtfulness and careful
consideration of your
response. Consequently, the
concept that speed is a
significant factor in
cognitive ability differs
across cultures. Measures
that have time restrictions
can thus place some test-
takers at a severe
disadvantage. The best
solution to this problem is to
minimise test speed as a
factor when assessing test-
takers by ensuring that
there is adequate time to
complete the measure.

SOURCE: Table adapted from Van de Vijver (2016, p. 370).

It is considered to be good practice for the adaptation of a measure to


involve a combination of conceptual, cultural, linguistic, and
measurement aspects, and perspectives highlighted in Table 7.1 (Van
de Vijver, 2016). To give effect to this, a multidisciplinary team is
needed to spearhead the adaptation. The adaptation team should
have ‘expertise in the construct, the target languages, and the cultures
… as well as measurement issues’ (Van de Vijver, 2016, p. 369).
7.3.2 Challenges related to test adaptation in South Africa
While the previous sections in this chapter have outlined the ‘why’ and
‘types’ of test adaptation in theory, you might want to know more
about the practice of test adaptation in South Africa. This sub-section
will outline some of the challenges, the reasons for them, and some
potential solutions will be suggested.
The task of adapting measures and translating them into multiple
languages provides researchers in South Africa, and elsewhere in the
world, with a number of challenges. For example, Van Eeden and
Mantsha (2007) highlight a number of translation difficulties
encountered when translating the 16 Personality Factor Questionnaire
Fifth Edition (16PF5) into TshiVenda from English. They note that
such translation difficulties could result in inaccuracies that change the
meaning of an item and make it difficult for the test-taker to
understand the item. Among the translation difficulties were:
the fact that there is not an equivalent term for a concept in the target
language. For example, there is no TshiVenda term for ‘depression’,
which made the literal translation of the item ‘I don’t let myself get
depressed over little things’ problematic. Other researchers have
similarly found that restricted vocabulary makes it difficult to translate
personality descriptors into isiXhosa (Horn, 2000) and Tsonga
(Piedmont, Bain, McCrae & Costa, 2002)
that idiomatic expressions cannot be translated literally
• the use of the negative form (e.g. ‘I don’t let myself get
depressed’), which often confuses test-takers.
To this list, Steele and Edwards (2008) add that there might be
differences in the terms and expressions with which the subgroups
(for example rural and urban) within a language group are familiar.
This could impact on test translation accuracy if the translators are not
representative of the key subgroups.
A note of caution needs to be sounded in relation to the fact that,
given that the majority of South Africans are not educated in their
mother tongue, it is not always easy to determine the one language in
which test-takers are most proficient for assessment purposes. In
many instances bilingual assessment might be more appropriate as
test-takers may be able to comprehend and respond to items that
draw on previously learned knowledge by using the language in which
they were educated in, but find it easier to respond to novel problem-
solving items using their mother tongue. However, to be able to use
bilingual assessment, assessment practitioners need multiple-
language versions of measures and will have to be proficient in both
languages being used to administer a measure in. If not, they could
use a fellow practitioner, trained assistant or a translator who is
proficient in the language of assessment to assist them during test
administration.
Furthermore, Van Eeden and Mantsha (2007, p. 75) note that ‘it is
clear that the problems experienced at the item level when testing in a
multicultural context are not merely related to language but also
involve a cultural component.’ This is an important reminder that the
content of items could be culture-bound and that constructs could
manifest differently in different groups. Even when international
measures are adapted for use in the multicultural South African
context, and biased items are removed and language issues
addressed, Meiring, Van de Vijver and Rothmann (2006) found that
although bias was reduced for the adapted version of the 15 Factor
Questionnaire (15FQ), internal consistencies-reliabilities remained low
for the black sample, which is problematic. Consequently, in the
domain of personality assessment in particular, in an attempt to find a
measure that is appropriate for the different cultures in South Africa,
attention is shifting to the development of an indigenous measure, the
South African Personality Inventory (SAPI), that includes both
universal and unique dimensions of personality (Nel at. al., 2012;
Fetvadjiev et al., 2015) (see Sections 6.2.1.3 and 7.7).

7.4Equivalence of adapted versions


of measures used in multicultural
and multilingual contexts
In this section, we discuss the importance of the equivalence of the
original (source) measure and adapted version(s). In addition, we
discuss the use of judgemental and related approaches to explore
equivalence of the translated aspect of an adapted measure,
sometimes referred to as ‘linguistic equivalence’ (Cheung &
Fetvadjiev, 2016, p. 333). Statistical approaches to establishing
equivalence will be discussed in Section 7.5.

7.4.1 Equivalence of measures adapted for cross-


cultural use
The attainment of equivalent measures is perhaps the central issue in
cross-cultural test adaptation and comparative research (Cheung &
Fetvadjiev, 2016; Van de Vijver, 2016; Van de Vijver & Leung, 1997;
Van de Vijver & Tanzer, 2004). For measures to be equivalent,
individuals with the same or similar standing on a construct (such as
learners with high mathematical ability) but belonging to different
groups (such as isiXhosa- and Afrikaans-speaking) should obtain the
same or similar scores on the different language versions of the
measure. If not, the items are said to be biased and the two versions
of the measure are non-equivalent. If the basis of comparison is not
equivalent across different measures, valid comparisons across these
measures and/or groups cannot be made, as the test scores are not
directly comparable. In such a situation, it is uncertain whether these
scores represent valid differences or similarities in performance
between test-takers from different cultural or language groups, or
merely measurement differences caused by the inherent differences in
the adapted measures themselves. However, it must be noted that
even if scores are comparable, it cannot be assumed that the
measure is free of any bias. Measures therefore have to be equivalent
if comparisons are to be made between individuals belonging to
different subgroups. While issues of equivalence and bias, and how
to investigate them, are comprehensively discussed in Section 7.5,
how to explore linguistic equivalence is discussed below.
7.4.2 Exploring linguistic equivalence
Having translated a measure as one aspect of adapting it, it is
important to explore the linguistic equivalence of the original and
translated versions. Linguistic equivalence ‘refers to items conveying
the same literal meaning’ (Cheung & Fetvadjiev, 2016, p. 333) in the
two versions. We will look at the use of judgemental approaches and
empirical investigations to establish the equivalence of translated
measures. However, we need to keep in mind that adapting a
measure involves more than just translating appropriate items. Items
can also be modified and even discarded, and additional ones
developed to ensure that the adapted measure is culturally sensitive.
For this reason, it is likely that there will be some differences between
the original and translated versions. However, if the translations are of
a good quality, which the approaches mentioned below help us to
assess, this will contribute to the extent to which the original and
adapted versions are equivalent.
7.4.2.1 Judgemental approaches
Judgemental approaches are based on a decision by an individual, or
a group of individuals, on the subjective degree to which the two
measures are similar. Usually judges are individuals who have
relevant experience and expertise in test content and measurement
issues, and are familiar with the culture and language of the groups
under consideration. The common approaches used are forward-
translation and back-translation.
When using the most common forward-translation approach, the
source version of a measure, which is referred to as the original
language source, is translated into the target language (i.e. the
language into which the measure is translated) by a single translator
or a group of translators. Thereafter, different individual (or preferably
a group of) bilingual experts (judges) compare the source and target
versions of the measure to determine whether the two versions are
equivalent. These comparisons can be made on the basis of having
the judges do the following:
simply look the items over
check the characteristics of items against a checklist of item
characteristics that may indicate non-equivalence
attempt to answer both versions of the item before comparing them for
errors (Hambleton & Bollwark, 1991).

The forward-translation approach, however, has


particular disadvantages:
it is often difficult to find bilingual judges who are equally familiar with
the source and target languages and/or culture
due to their expertise and experience, bilingual judges may
inadvertently use insightful guesses to infer equivalence of meaning
bilingual judges may not think about the item in the same way as the
respective source and target-language test-takers and thus the results
may not be generalisable.

In the back-translation approach, the original measure is first


translated into the target language by a set of translators, and then
translated back into the original language by a different set of
translators (Brislin, 1986). Equivalence is usually assessed by having
source language judges check for errors between the original and
back-translated versions of the measure. For example, a test is
translated from isiZulu to Setswana, back-translated into isiZulu by
different translators, and the two isiZulu versions are assessed for
equivalence by a group of judges with expertise in isiZulu. The main
advantage of this design is that researchers who are not familiar with
the target language can examine both versions of the source
language to gain some insight into the quality of the translation
(Brislin, 1986). Also, this design can easily be adapted such that a
monolingual researcher (assessment or subject specialist) can
evaluate and thus improve the quality of the translation after it has
been translated into the target language, but before it is back-
translated into the source language.
The main disadvantage of the back-translation approach is that the
evaluation of equivalence is carried out in the source language only. It
is quite possible that the findings in the source language version do
not generalise to the target language version of the measure. Another
disadvantage is that the assumption that errors made during the
original translation will not be made again during the back-translation
is not always applicable (Hambleton & Bollwark, 1991). Often skilled
and experienced translators use insight to ensure that items translated
are equivalent, even though this may not be true. This, however, can
be controlled by either using a group of bilingual translators or a
combination of bilinguals and monolinguals to perform multiple
translations to and from the target and source languages (Bracken &
Barona, 1991; Brislin, 1986). For example, Brislin (1986) suggests the
use of monolinguals to check the translated version of the measure
and make necessary changes before it is back-translated and
compared to the original version. Once the two versions of a measure
are as close as possible, Bracken and Barona (1991) argue for the
use of a bilingual committee of judges to compare the original or back-
translated and the translated version of the measure to ensure that the
translation is appropriate for the test-takers.
7.4.2.2 Empirical investigations
Two main approaches are used to empirically investigate the
equivalence of translated measures. These approaches are linked to
the characteristics of participants, that is, bilingual and monolingual
speakers, as well as to the version of the translated measure, that is,
original, translated, or back-translated (Hambleton & Bollwark, 1991).
In the bilingual test-taker approach, both the source and target
versions of the measures are administered to test-takers who speak
both the source and target languages, before statistically comparing
the two sets of scores as well as the ordering of the item difficulties in
the two versions. The advantage of this approach is that, since the
same test-takers take both versions of the measure, differences in the
abilities of test-takers that can confound the evaluation of translation
equivalence will be controlled (Hambleton & Bollwark, 1991). The
disadvantage of this approach is that, due to time constraints, test-
takers might not be able to take both versions of the instruments. A
more serious problem, however, is that the results obtained from
bilingual test-takers may not be generalisable to the respective
source-language monolinguals (Hambleton, 1993).
Another approach used is where source-language monolinguals
take the source version and target-language monolinguals take the
target version of a measure (Brislin, 1986; Ellis, 1989, 1991). The two
sets of scores are then compared to determine the equivalence
between the two versions of the measure. The main advantage of this
approach is that, since both source and target-language monolinguals
take the versions of the measure in their respective languages, the
results are more generalisable to their respective populations. A major
problem, however, is that the resulting scores may be confounded by
real ability differences in the groups being compared, since two
different samples of test-takers are compared (Hambleton, 1993). To
minimise this problem, test-takers selected for the groups should be
matched as closely as possible on the ability/abilities of interest and
on criteria that are relevant to the purpose of assessment.
Furthermore, conditional statistical techniques that take into account
the ability of test-takers when comparing test scores can also be used
to control for differences in ability between the source and target test-
taker samples. Some of these techniques are discussed in Section
7.5.

Statistical approaches to
7.5
establish equivalence
As noted in Section 7.4.1, it is important that measures used in
multiple populations are equivalent. In other words, psychological
measures should yield scores that mean the same thing across
different groups. We should consistently examine whether an
assessment measure functions similarly across groups, for reasons
that may be leg-ally, practically, socially or theoretically important
(Aguinis, Culpepper & Pierce, 2010; Zickar, Cortina & Carter, 2010). In
this section, we will start out by giving you a broad overview of the
concepts of bias, equivalence, and invariance, before we address the
main statistical techniques used to determine equivalence.
7.5.1 Bias and equivalence
To meaningfully interpret differences on psychological measures
between test-takers from different groups, we have to consider the
threat of bias and non-equivalence (Van de Vijver & Matsumoto,
2011). Bias refers to any systematic* error or noise in test scores that
is caused by irrelevant factors (i.e. any factor other than the construct
that we are trying to measure with a particular test). In other words,
bias is any construct-irrelevant source of variance that results in
systematically higher or lower scores for identifiable groups of test-
takers (American Educational Research Association, American
Psychological Association & National Council for Measurement in
Education, 1999). Bias is closely related to the concept of
equivalence. Johnson (1998) reviewed a wide variety of
interpretations of equivalence and concluded that they can be grouped
into those that focus on equivalence of meaning, and others that focus
on measures and procedures used to make comparisons between
groups of test-takers. Van de Vijver and Leung (2011) suggest an
integrative taxonomy of equivalence and bias. They interpret bias and
equivalence as opposite sides of the same coin, albeit with
distinguishing foci: ‘Bias threatens the equivalence of measurement
outcomes across cultures, and it is only when instruments are free
from bias that measurement outcomes are equivalent, and have the
same meaning within and across cultures’ (p.19). In their view, cross-
group equivalence requires a lack of bias, and the presence of cross-
group bias inevitably results in some form of equivalence. Bias is a
broad term that encompasses various construct-irrelevant sources of
variance, whereas equivalence generally refers to a lack of bias that is
due to group-related effects, specifically. As such, equivalence may be
better understood as a lack of group-related bias, i.e. where bias in
measurement lies primarily in some systematic difference in
measurement between comparison groups. However, in South African
practice, the term ‘equivalent’ normally is used to mean that a test is
free from group bias (in how it measures a particular construct across
populations).
Finally, the terms equivalence and invariance are often used
interchangeably depending on the functional discipline from which the
researcher approaches the statistical analysis. In cross-cultural
psychology, the term equivalence is more often used, although its
implied meaning often includes more substantive cross-cultural issues
(such as the meaning of constructs that may or may not exist across
groups) and not only measurement aspects. In literature that focuses
on equivalence from a purely psychometric point of view, the term
‘invariance’ is more popular. To be clear, when we use the term
‘invariance’ in this chapter, it means ‘equivalence’. In other words, a
measure that lacks invariance between two groups is deemed to be
equivalent.
Various typologies of bias exist (e.g. Murphy & Davidshofer, 2005;
Van de Vijver & Leung, 2011), but for now, let us distinguish broadly
between measurement bias and predictive bias (American
Psychological Association, 2003):

Bias is defined as the undesirable property of a test, test scores, or


test-score use that yields systematic and construct-irrelevant errors in
(a) the measurement of a construct and/or (b) predictions made
from such measures.

Next, we discuss measurement bias/equivalence, before we delve


deeper into prediction bias/equivalence, and other forms of
equivalence.

7.5.2 Measurement bias and equivalence


Measurement bias or inaccuracy exists if the score on the item/test
does not reflect the test-taker’s actual status on the attribute being
measured (Millsap, 2011).

Measurement bias can be defined as systematic inaccuracy in


measurement.
Earlier, it was widely accepted that measurement bias is not very
common, but this view has shifted for a number of reasons.
Contemporary bias-analysis techniques provide more sophisticated
ways in which bias may be detected than was possible earlier. Also,
statistical-analysis software packages (used to conduct bias analysis)
are more sophisticated and readily available, and researchers are
more aware of them (Millsap, 2011). At the same time, the explosion
in cross-cultural research has created a greater need for, and
awareness of, measurement bias analysis (Matsumoto & Van De
Vijver, 2011). Jointly, these changes have led to a resurgence in the
use of measurement-bias analysis techniques. Bias analysis has
become a standard component of instrument development and
validation in addition to reliability and validity analyses.
Measurement bias can result from a host of factors, but when we
look at sources of bias that may be related to group factors (e.g.
culture, language, sex, etc.) in particular, measurement equivalence is
the primary concern.

Measurement equivalence is the level of comparability of


measurement outcomes for different groups. For example, ‘Do
scores from personality scales have the same meaning within
and between cultures?’ (Van de Vijver and Leung, 2011, p. 19).

For instance, applied in a psychological testing situation, a depression


inventory that provides consistently different scores for males and
females shows a lack of measurement equivalence in relation to
gender if these groups do not actually differ in terms of their ‘true’
levels of depression. A measure that is equivalent should function
similarly across varied ‘conditions’ (e.g. schools, provinces, gender or
population groups), so long as those varied conditions are irrelevant to
the attribute being measured.
Equivalence is tantamount to another often-used term, namely
invariance. When a test similarly measures the same latent
(underlying) dimension in all populations, it shows measurement
invariance (MI) (Raykov, Marcoulides & Li, 2012).

Measurement invariance refers to a desired lack of systematic


measurement variance as a function of group membership (Zickar,
Cortina, & Carter, 2010).

A lack of MI is a source of unwanted contamination or bias that


threatens the validity of measurement (Zickar, Cortina & Carter, 2010).
So, you can see that ‘invariant’ means the same thing as ‘equivalent’
in this context.
However, bias is not always the culprit hiding behind score
differences. Noting that two or more groups score differently on a test
does not necessarily mean that the test is biased. For example, men
are generally able to pick up and carry heavier objects than women
are able to. Does this imply the measure (e.g. the ‘lifting test’) is
biased? A lifting test that shows gender-related mean differences is
probably not biased, but may simply be showing actual differences in
lifting ability. To determine whether a test shows measurement bias,
given the possibility that there could be real differences between
groups in the target variable being measured, we have to rely on a
principle of matching individuals on the variable first (Angoff, 1993;
Millsap, 2011). We could conclude that an item or test score is
unbiased in relation to two or more groups of people (or individuals)
that are identical (i.e. matched) if the probability of attaining any
particular score on the item is the same for the two individuals (Lord,
1980). Although this is a simple example, test users often confuse the
observation of average score differences with bias. Earlier we noted
that bias and equivalence may exist at any level of measurement.
Next, we discuss the nature of measurement bias/equivalence at
different levels of measurement and provide an overview of the major
techniques to detect this.
7.5.2.1 Bias and equivalence at the item
level Differential item functioning (DIF)
When differential item functioning (DIF) is investigated, statistical
procedures are used to compare test results of test-takers who have
the same ability but who belong to different cultural (or language)
groups. Hambleton, Swaminathan, and Rogers (1991) note that the
accepted definition of DIF is as follows:

An item shows differential item functioning if individuals


having the same ability, but from different groups, do not have
the same probability of getting the item right.

The ability level of test-takers is a critical factor since statistical


procedures are based on the assumption that test-takers with similar
ability levels should obtain similar results on the measure. In addition,
it is unreasonable to compare test-takers with different levels of ability
since their test scores will inevitably differ, irrespective of their cultural
or linguistic backgrounds. Stated simply, equivalence cannot be
determined using unequal comparisons.
In practice, differential item analysis methods cannot detect bias as
such. Rather, these methods merely indicate that an item may
function differently or that it provides different information for test-
takers who have the same ability but belong to different subgroups, for
example male and female test-takers (Holland & Thayer, 1988). To
determine whether any bias exists, those items flagged need to be
further investigated to identify possible reasons for functioning
differently. Therefore, the term item bias has been replaced by the
less value-laden term differential item functioning (DIF). Thus, an item
that exhibits DIF may or may not be biased for or against any group
unless specific evidence indicates otherwise.
Mantel-Haenszel procedure
Among the statistical methods to investigate DIF is the Mantel-
Haenszel procedure, which is based on the assumption that an item
does not display any differential item functioning (DIF) if the odds of
getting an item correct are the same for two groups of test-takers
across all different ability levels. In this procedure, the odds ratio is
used to compare two groups where:
test-takers are first matched into comparable groups, based on ability
levels; for each ability level separately, the pass/fail data are tabulated
for the two matched groups in a two-by-two table, and then
the matched groups are compared on every item on the measure for
each of the ability groups. The null hypothesis, that is, that the odds of
a correct response to the item are the same for both groups at all
ability levels, is tested using the chi-square statistic with one degree of
freedom.

It is important to note that the identification and elimination of DIF


from any measure increases the reliability and validity of scores. Test
results from two or more groups from which DIF items have been
removed are more likely to be equivalent and thus comparable.
Rasch model and DIF detection
Wyse and Mapuranga (2009) note that, traditionally, DIF analyses
have ‘focused on investigating whether items function differently for
subgroups by comparing the performance of the advantaged group,
the reference group (e.g. whites and males), to that of the
disadvantaged group, the focal group (e.g. females and blacks)’ (p.
333–334). They then argue that over time the focus of DIF analyses
has expanded to also include ‘the possible causes of DIF and their
relationship to testing contexts and situations’ (p. 334). In the process,
other DIF analysis methods have been developed. Wyse and
Maouranga (2009) report on a DIF analysis method that uses Rasch
item information functions. They derived the Information Similarity
Index (ISI) and found that it was as good as, if not better than, the
Mantel-Haenszel procedure. Van Zyl and Taylor (2012) used Rasch
model item information functions when conducting DIF analyses of the
Myers-Briggs Type Indicator (MBTI) for South African subgroups.
Readers are referred to this article to develop an applied
understanding of DIF analyses at the item level.
Item Response Theory
Item Response Theory (IRT) is a test theory used to develop and
assemble test items, detect bias in measuring instruments, implement
computerised adaptive tests, and analyse test data. IRT is an
extremely powerful theory that is especially useful for large-scale
testing programmes. By applying IRT, it is possible to analyse the
relationship between the characteristics of the individual (e.g. ability)
and responses to individual items (Hulin, Drasgow & Parsons, 1983;
Lord, 1980; Thissen and Steinberg, 1988). A basic assumption of IRT
is that the higher an individual’s ability level is, the greater the
individual’s chances are of getting an item correct. This makes sense
as good students generally obtain higher marks.
This relationship is graphically represented (see Figure 7.2) by the
item characteristic curve (ICC). The X-axis in Figure 7.2 represents
the ability level of test-takers while the Y-axis represents the
probability of answering an item correctly. The difficulty level of an
item is indicated by the position of the curve on the X-axis as
measured by that point on the curve where the probability of
answering an item correctly is 50 per cent. The slope of the curve
indicates the discriminating power. The steeper the slope, the higher
the discriminating power of an item. Item 1 displays a curve for an
item of moderate difficulty that is highly discriminating, while the curve
for Item 2 implies that the item is a difficult item with very little
discrimination (for test-takers in the same group).
The disadvantage of IRT, however, is that relatively large sample
sizes are required for analysing the data. In addition, the complex
mathematical calculations require the use of sophisticated software,
which can be costly to purchase.
IRT and DIF detection
The detection of DIF using IRT is a relatively simple process. Once
item characteristics and test-taker ability measures are calculated, the
item characteristic curves (ICCs) of the groups under investigation can
be directly compared. This can be done graphically or statistically, and
involves determining whether any significant differences exist between
the respective ICCs. Figure 7.2 contains two ICCs for two groups of
test-takers with the same ability levels. Since the ability levels of both
groups are equal, the ICCs for both groups should be the same.
These ICCs, however, are clearly different, indicating that the item is
functioning differently for each of the groups. If many of the items in a
measure produce the same DIF picture, Group 2 will obtain lower test
scores, and the measure would not be measuring the construct as
well for Group 2 as it would for Group 1 (due to lower discriminatory
power for Group 2). Once the existence of DIF is confirmed, the next
step involves ascertaining the probable reasons for the DIF detected.
Generally, items detected as DIF are revised if possible, or removed
from the measure. Possible reasons for DIF can usually be traced to
the use of unfamiliar, inappropriate, or ambiguous language,
concepts, or examples; test speededness; and unfamiliar item format.

Figure 7.1 Item characteristic curves for two items in same group of test-takers
Figure 7.2 Item characteristic curves for biased item (X) for two groups

» CRITICAL THINKING CHALLENGE 7.1

Look at the ICCs in Figure 7.2. Assume that Group 1 represents the
ICC for males and Group 2 the ICC for females for an item. Explain
to a friend why this item is biased against females.

7.5.2.2 Bias and equivalence in factor structures between groups


Since test scores often combine information from multiple items, it is
important that bias and equivalence are also investigated at ‘higher’
levels (than item-score level only). For example, factor-analytic
techniques (e.g. Vandenberg, 2002; Vandenberg & Lance, 2000)
explore differences in the mathematical factor structures of two or
more populations of test-takers. Factor analysis is a data-reduction
technique that aims to identify underlying dimensions or factors – or
linear combinations of original variables (Hair, Black, Babin &
Anderson, 2010) – that can account for the original set of scores on
various test items.
Factor analysis can be conducted using exploratory or
confirmatory techniques, depending on the goals of the investigation.
Exploratory Factor Analysis (EFA) is often used to assess
measurement equivalence, and involves conducting separate factor
analysis in each group (and rotating them to certain target solutions)
and comparing the resultant factor solutions for evidence of
equivalence. EFA can easily be conducted in most statistical software
applications. EFA techniques have certain shortcomings as methods
to establish equivalence, such as low power to detect small
differences across cultures, especially when using long tests. Also,
since they focus on factor loadings only, other important statistical
parameters that may differ between groups cannot be compared (Van
de Vijver & Leung, 2011).
Aside from exploratory techniques, another way of assessing how
the factor structure of constructs differs across groups is through
confirmatory factor analytic (CFA) techniques.

Confirmatory factor analysis (CFA) is a way of testing how


well measured variables, typically test item scores, represent a
smaller number of underlying constructs (Hair et al., 2010).

Where EFA is principally used to explore data, CFA is used to provide


a stronger confirmatory test of our measurement theory (i.e. how we
propose that our set of test items ‘tap into’ underlying constructs). As
an approach to investigate equivalence, confirmatory factor analysis is
more flexible than EFA and can address various levels of equivalence
(Van de Vijver & Leung, 2011). Also, CFA normally uses covariances
as a basis of analyses, and not correlation coefficients (like EFA).
Covariances contain information about the metric of the original
scales, and correlations do not.
Multi-group confirmatory factor analysis (MGCFA) techniques are
commonly used to investigate unwanted sources of variance in test
scores, because they allow one to understand the complex
multidimensional nature of contamination by isolating different sources
of variance. The goal of measurement equivalence studies that use
CFA is to identify group membership as a possible contaminant, that
is, an unwanted source of variance, in test scores.
Logic behind using CFA to study equivalence
At this point, it is appropriate to explain the logic of using CFA to
assess bias and equivalence. In statistical analyses of test-score data,
we use implicit or explicit measurement models to explain how test
items relate to underlying psychological characteristics that these
items are supposed to measure. When using factor-analytic
techniques to study measurement bias, these models are represented
in complex mathematical models that capture information about
relationships between items, and latent or underlying constructs. They
also specify measurement errors, variances of factors, and
covariances between factors.
It is possible that systematic inaccuracies, or biases, exist in any of
a wide array of characteristics of a measurement model that is
assessed between two or more groups of test-takers. As such, bias
analyses in respect of factor structures investigate systematic group-
related differences in any of a number of important statistical
parameter estimates. They are called parameter estimates because
they provide a statistical indication of how well the test measures a
particular construct within the population. Parameter estimates tell us
about the characteristics of a model that represents how items
contained within a measure are related to the underlying (latent)
factors, or traits, being measured.
A number of important measurement model parameters may
systematically differ (i.e. show bias) when test scores of two or more
groups are compared, including item-factor loadings, item-factor
regression intercepts, latent factor variances and covariances, factor
intercorrelations, and various error terms. The goal of measurement
invariance/equivalence studies (which will be discussed later) is to
identify and remedy these sources of measurement bias.
Levels of measurement equivalence
There are different typologies that describe forms of measurement
equivalence. For example, in cross-cultural psychology, a broad
equivalence framework has been proposed that portrays equivalence
at four major levels (Fischer & Fontaine, 2011):
Functional equivalence exists when the same construct exists within
each of the cultural groups. Although we may ‘guess’ from the results
of our statistical analyses that the same construct is not being
measured in two groups, it is also possible that a construct, as we
define it in one group, does not even exist as a shared meaning in a
different culture. For instance, many cultures do not even have words
that are labels for constructs commonly known in other cultures, such
as ‘depression’, or ‘burnout’.
Structural equivalence implies that the same indicators (items) can
be used to measure the theoretical construct. Structural equivalence is
defined as the degree to which the internal structure of measurement
instruments is equivalent between cultural groups. Structural
equivalence may be explored using a decision-tree approach that
proposes four data-analytic methods, namely multidimensional scaling
(MDS), principal components analysis (PCA), exploratory factor
analysis (EFA), and confirmatory factor analysis (CFA). Earlier, Van
der Vijver and Leung (1997) used the term ‘construct equivalence’ to
indicate a combination of functional and structural equivalence; the
measurement aspect of structural equivalence could be separated
from the theoretical discussion of functional equivalence of the
theoretical construct, making functional equivalence a prerequisite of
structural equivalence (Fischer & Fontaine, 2011).
Metric equivalence implies that equal intervals exist between
numeric values of the scale in all groups being compared, allowing
relative comparisons between cultural groups (for instance, comparing
the relative difference between males and females on aggression
between cultural groups).
Full-score equivalence, which is the highest form of equivalence,
implies that scale of the underlying dimension shares the same metric
and the same origin between cultural groups. As such, cultural groups
can be directly compared, i.e. the same observed score between test-
takers from different groups refers to the same position on the
underlying dimension in each of the cultural groups.
Testing measurement equivalence
At this point, it should be clear that measurement equivalence may
exist to varying degrees, but how do we statistically distinguish
between these levels? We can rely on factor structure comparisons in
MGCFA, which use nested sequences of multiple-group statistical
models that compare characteristics of structures across groups in
order to identify possible sources of lack of invariance (see Cheung &
Rensvold, 1999; Meredith, 1993; Muthén, 2000; Vandenberg & Lance,
2000):
Configural invariance: Configural invariance is the weakest form of
equivalence as it provides only limited justification that scores may be
equated across groups. Configural invariance is also known as equal
form invariance (Kline, 2011). Here, a measurement model is fitted
simultaneously to the covariance matrices of two or more independent
samples.

Configural invariance (CI) is said to exist when:


• we fit a specified measurement model simulta-neously across groups
• only the number of factors and the factorindicator
correspondences for both groups are specified to be the same
• all other parameters are freely estimated
• it holds for both groups.

Configural invariance only requires that the same items load on the
same factors for both groups (Zickar, Cortina & Carter, 2010).
However, we cannot yet say whether differences may exist between
the groups’ respective factor loadings, item factor intercepts, unique
variances, correlations between factors, and so forth. Often, the model
that is tested here is used as a baseline model, against which more
restrictive successive models are tested, and additional parameters
are constrained in a step-by-step fashion.
Metric invariance: A ‘stronger’ form of equivalence exists when
metric invariance is established.

Metric invariance exists when the relationship between the


observed variables (item responses) and the latent traits specified in
the measurement model are equally strong across the groups. Metric
invariance is also known as construct-level metric invariance or equal
factor loadings (Kline, 2011).

Stated differently, the researcher’s interest is in the hypothesis that the


unstandar-dised factor loadings are invariant across the groups (Kline,
2011; Zickar, Cortina & Carter, 2010). As such, a test of metric
equivalence determines whether regression coefficients that link item
scores to their latent factor/s are the same for all groups. If this equal
factor loadings hypothesis is retained, we may conclude that the
constructs are manifested in the same way in each group (Kline,
2011). If it is found that certain items (or indicators) have different
loadings across groups, they are differentially functioning indicators
and, if some but not all indicators have equal loadings in each group,
indicator-level metric invariance exists (Kline, 2011).
Scalar invariance:

Scalar invariance tests for the equality of the observed variable


intercepts (i.e. means) on the latent construct (Hair et al., 2010).

When item intercepts are set to be the same across groups, and the
model still fits, it is called scalar invariance (Vandenberg & Lance,
2000), intercept invariance (Cheung & Rensvold, 1998, 1999, 2000),
or full-score equivalence (Fontaine, 2005; Van de Vijver & Leung,
1997). If one wants to compare the mean scores of two groups, scalar
invariance needs to be supported.
Scalar equivalence simply means that the measurement scale is
identical across versions of the assessment (Sireci, 2011). For
example, scalar equivalence across Afrikaans- and English-language
versions of a test would mean that a score of 80 on one version of the
test ‘means the same thing’ as the same score on the other language
version of the test.
Apart from CFA techniques that assess scalar equivalence, item
bias analyses, such as differential item functioning (DIF) analysis, may
also be used to establish scalar equivalence. In such cases, DIF is
typically used in conjunction with exploratory factor analysis – EFA
identifies the equivalent factors, and DIF is used to examine the
appropriateness of items in more detail (Van de Vijver & Leung, 2011).

Factor variance-covariance invariance: If a test measures more


than one factor, it would be useful to also investigate whether the
relationships between latent (unobserved) factors are the same
across groups being tested.

Factor variance-covariance equivalence/invariance assesses

whether the correlation between, for instance, a numerical reasoning


factor and a verbal reasoning factor assessed by a general intelligence
measure are related similarly in two or more groups.

A lack of this level of invariance means that, for example, in a sample


of men, these two aspects of intelligence are more strongly correlated
than in a sample of females. Since testing the hypothesis of equality
across groups in this analysis involves both factor variances and
covariances, rejection of the hypothesis could imply that respondents
in both groups do not use the same scale range to respond to items
(factor variances) and that the pattern and strength of factor
intercorrelations (factor covariances) are not the same across groups
(Zickar, Cortina & Carter, 2010).
Error variance invariance: Lastly, it is also possible to determine
whether the errors of measurement, or residual variances, produced
by our measure are similar in two or more groups.

Error variance invariance implies that random and systematic errors


of measurement are similar across groups (Hair, Black, Babin &
Anderson, 2010). It is also called uniqueness invariance (Zickar, Cortina
& Carter, 2010).

7.5.2.3 Evaluation of measurement invariance (MI) approaches


CFA is a sophisticated and flexible technique that can simultaneously
detect bias at different levels, such as item bias and bias in factor
structures and related parameter estimates. Despite these
advantages, the use of CFA to assess equivalence is not without its
drawbacks. CFA using multi-group confirmatory factor analysis
(MGCFA) requires relatively large samples (Hair, Black, Babin &
Anderson, 2010) and the most appropriate use of fit statistics and
tests of significance is a hotly debated issue in structural equation
modelling (e.g. Raykov, Marcoulides & Li, 2012). As a result,
researchers and practitioners should stay abreast of new
developments. Lastly, CFA is a relatively complex statistical analysis
technique that can only be conducted using specialised statistical
software, such as LISREL, MPLUS and AMOS, to name a few
packages.
To conclude, we can see that measurement bias/equivalence
studies play a critical part in our efforts to ensure that measures are
unbiased. However, it was noted earlier that bias may also creep into
aspects not related to measurement per se. Millsap (2011) takes a
broader perspective on bias/equivalence and includes, in addition to
equivalent measurement outcomes, the equivalence (or invariance, as
he puts it) of prediction outcomes. Next, we turn to prediction bias and
equivalence.

7.5.3 Prediction bias and equivalence


The focus of bias research is generally on measurement aspects, and
not necessarily on the way test scores are used in decision-making.
However, very few psychology practitioners test others simply ‘for the
sake of testing’. Apart from research uses, these test scores are
widely used for decision-making in psychological practice. When we
make decisions based on applicants’ test scores, we are also
(implicitly or explicitly) making predictions of their future expected
behaviour. If so, prediction bias may occur, which is a problem that
results when the way in which scores on an important outcome
(criterion) are predicted for persons from an identifiable group (e.g.,
race, gender, age, ethnicity, or language groups) differs systematically
from the way it is predicted for members from another group. Stated
otherwise, group membership may influence the prediction of criterion
scores.

Prediction bias exists when there is a systematic group-


related difference in the way in which criterion scores are
predicted for different subgroups.

By extension, prediction equivalence exists when the expected


criterion (Y) score for two test-takers from different groups is the same
when their predictor scores are the same. In other words, when group
membership does not unduly affect (or moderate) the estimated
criterion score for all applicants.
7.5.3.1 Procedures to establish prediction bias and equivalence
Procedures that are based on multiple-regression analysis (e.g.
Cleary, 1968), such as moderated multiple-regression analysis
(MMR, Aguinis & Stone-Romero, 1997), are normally used to
establish predictive bias. In this approach, test-takers’ scores on some
outcome variable (e.g. academic performance) are regressed against
a combination of their predictor scores (e.g. matric results), their group
membership (e.g. gender group), and a variable that combines these
two (i.e. as an ‘interaction variable’, such as Matric*Gender). The
analysis results tell us whether it is possible to predict the criterion
(e.g. performance) in the same way for persons from different groups.
If the results suggest we cannot, this phenomenon is called
differential prediction (or a lack of prediction invariance; Millsap,
2011). In essence, the relationship between the predictor (X) and
criterion (Y) depends on, or is moderated by, group membership (e.g.
race, gender). Stated otherwise, prediction invariance results when
the test’s predictive properties are moderated by population
membership (Millsap, 2011).
To explain this further, it is best to use an illustration (see Figure
7.3). If we draw regression lines on an X-Y axis, indicating the lines
that best describe the relationship between X and Y for two (or more)
groups of candidates, let’s suppose X, the measure of the
independent variable represents a student’s score on a predictor test
(e.g. conscientiousness), and Y, the dependent variable, represents
the same person’s score on a criterion score (e.g. their university
academic performance). Now, we plot the regression lines for two
groups, where Line 1 represents the line for women, and Line 2, and
Line 2 represents men (see Figures 7.3 and 7.4). Predictive bias is
observed when the lines for both gender groups do not coincide, i.e.
there is a significant difference between the two lines in either or all of
their slopes, their intercepts, or the prediction errors (Cleary, 1968).
Stated otherwise, bias in prediction exists when the regression
equations (y = a + bx) used to predict a criterion in two or more groups
differ significantly (statistically speaking) in terms of their intercepts,
their slopes, their standard errors of estimate, or any combination of
these (Murphy & Davidshofer, 2005).
Figure 7.3 shows the scenario where no predictive bias exists, i.e.
the expected academic achievement scores for two individuals from
different groups are the same, when their levels of conscientiousness,
or predictor (X) scores, are the same.

Figure 7.3 No predictive bias, because lines of prediction coincide (slopes and intercepts are equal)
Figure 7.4 Predictive bias in intercepts only

Figure 7.5 Predictive bias in slope only

Figure 7.6 Predictive bias in slopes and intercepts

In contrast, it is possible that two groups may differ on average in


terms of their criterion performance (Y) only. In such a case, we would
observe the scenario depicted in Figure 7.4: predictive bias, and only
the intercepts of two groups’ regression lines differ, also called
intercept bias.
Other than a lack of prediction bias, and intercept bias, it is also
possible to observe slope bias only (Figure 7.5), and the worst-case
scenario, both slope and intercept bias (Figure 7.6).
When assessment practitioners refer to the term ‘bias’, they should
be careful to not confuse measurement bias with predictive bias. If you
have understood the arguments outlined above, you would also grasp
that, first, bias is not necessarily a characteristic of a test per se, but
rather a property of inferences derived from test scores and the way
within which these scores are used. For instance, one may use a test
that does not show measurement bias at all, but which still shows
prediction bias when coupled with a criterion measure in a decision-
making context (for instance, if the criterion scores or prediction errors
differ systematically between groups, or if there are differences in
score reliabilities).

Prediction bias and the Employment Equity Act


It is interesting to note that, while the Employment Equity Act (No. 55
of 1998) prohibits the use of a psychological test or similar
assessment unless such a measure ‘is not biased against any
employee or group’ (p. 16, par. 8 (c)), it implies that both
measurement and predictive bias is prohibited (see also the
Promotion of Equality and Prevention of Unfair Discrimination Act;
Republic of South Africa, 2000). In assessment practice, however,
bias analyses typically focus on the measurement bias, while
predictive bias is often neglected. In many countries, (e.g. the US),
predictive bias is generally considered evidence of unfair selection
decision-making by courts of law (SIOP, 2003). When we say tests
are fair in the psychometric meaning, they do not result in biased
predictions of criterion scores for different groups.

Fairness, therefore, is evident when one observes a lack of


predictive bias (Cleary, 1968).

South African legislation and court rulings do not yet distinguish


between measurement and predictive bias, although a broader
interpretation of bias dictates that more attention should be given to
both forms of bias. It is important to note that establishing prediction
invariance involves scrutiny of many aspects other than only predictor
scores, including criterion scores, reliabilities and variances of
measures, and others (Aguinis & Stone-Romero, 1997; Theron, 2007).
A wide range of statistical software is available to conduct
analyses for both types of bias. Measurement bias analyses to assess
DIF may be conducted in MS Excel (with special macros, e.g. Biddle
Consulting Group Inc., 2005), SPSS (SPSS, 2012) or specialised DIF
software. Structural equation modelling software (such as LISREL,
Jöreskog & Sörbom, 2012), IBM SPSS Amos (IBM Corporation,
2012), and M-Plus (Muthén & Muthén, 1998−2017) are routinely used
to assess measurement bias with confirmatory factor analysis (CFA).
Predictive bias analysis is easily conducted in MS Excel (with the
statistical analysis Toolpak add-in), as well as other basic statistical
software that can conduct regression analyses (e.g. IBM SPSS, IBM
Corporation, 2012; StatSoft Statistica, Statsoft, 2012).

7.5.4 Construct bias and construct equivalence


Certain constructs that may be familiar in one cultural group may not
be familiar in another, or a particular construct may be expressed in a
different way.

Construct bias results when behaviours that are relevant to the


construct do not completely overlap in two different cultural groups,
i.e. the construct has a lack of shared meaning between two groups.

This differential appropriateness of the behaviours associated with the


construct across cultures is called construct bias (Van de Vijver &
Leung, 2011). For instance, it has been suggested that for some
cultural groups depression is expressed in terms of physical
symptoms rather than in the general Western experience of this
disorder, where depression is expressed in terms of subjective and
affective symptoms (Matsumoto, 2000). If this is true, questionnaires
that measure depression cannot be used in different cultures without
succumbing to construct bias. Construct bias refers to a certain
inconsistency in the attribute being measured across groups, which
may relate to the measurement of a construct across cultural groups
using a monolingual measure of the construct (e.g. black and white
applicants completing a personality inventory in English), or, a
multilingual measure of the construct (e.g. isiXhosa- and English-
speaking test-takers respectively take the isiXhosa- and English-
language versions of a South African personality test) (Sireci, 2011).
Construct bias, therefore, means that a construct lacks a shared
meaning between two or more groups, implying that it makes any
further group comparison on measures of such a construct impossible
(Van de Vijver & Leung, 2011).

7.5.5 Nomological bias


When developing and validating a new measure, it is important that
we can demonstrate that it measures the same thing as other
measures of the same construct, and that it does not correlate with
measures of distinctly different constructs. A nomological network
can be described as the system of relationships among a set of
constructs (Cronbach & Meehl, 1955). Van de Vijver and Leung
(2011) state that, when nomological networks are the same across
two or more groups, functional equivalence exists – a specific type of
‘structural equivalence’:

Functional equivalence exists when a questionnaire measures the


same construct in two cultures, and that the same patterns of
convergent (significant relationships with presumably related
measures) and divergent validity (i.e. zero correlations with
presumably unrelated measures) are observed with other measures.

One way to assess nomological/functional bias is to utilise multi-group


structural equation modelling (MGSEM) to test the equivalence of a
comprehensive structural model across samples drawn from different
populations.
7.5.6 Method bias
It is important to note that the term ‘bias’ is a much broader term that
includes systematic distortion in scores due to various effects not
necessarily related to people’s group membership, such as rater bias,
response-set bias, influences of other irrelevant constructs on scores,
and so forth. One source of bias resides in group-related differences
in test-score meaning, for example when the group (e.g. gender, race,
age) that a person belongs to contaminates or confounds test scores
beyond that person’s true level on the construct being measured. In
contrast, method bias can be described as a systematic source of
error affecting scores due to a bias that is inherent in the method of
measurement and/or the measurement process (Van de Vijver &
Leung, 2011).
In cross-cultural assessment, examples of method bias may
include test-administrator bias (e.g. of interviewers, translators,
coders), differential familiarity with item formats, differential social
desirability, and response sets such as acquiescence (Johnson,
Shavitt & Holbrook, 2011; Sireci, 2011). Method bias typically affects
all items within an assessment, and not only individual items (Sireci,
2011).

» ETHICAL DILEMMA CASE STUDY 7.1

Thandi, who is isiZulu-speaking, applies for admission to the MA in Counselling


Psychology programme. As part of the selection process, she has to complete a
personality test. Her profile on the test is elevated, which suggests the presence
of some pathology. Based on this, the selection panel decides not to admit her to
the Master’s programme. Thandi explores research on this personality test and
discovers that the test profile of black South Africans is usually elevated and that
no studies have been undertaken to establish whether or not the test is
measuring the same construct across different cultural groups in South Africa.
Consequently, she lodges a complaint against the university with
the Professional Board for Psychology as she believes that the
selection process was unfair.
Questions:
(a) Do you think Thandi has a strong case? Identify the potential ethical
transgressions of which the psychology department might be guilty.
(b) Provide the psychology department with suggestions on how to research the
personality test to determine if it is measuring the same construct across
various groups? If construct equivalence is not found across different groups,
advise the department how they can go about adapting the measure so that it
can be used fairly with applicants from diverse cultural groups.

7.6Steps for maximising success in


test adaptations
The International Test Commission (ITC) provides a set of 18
guidelines for improving the translating and adapting of educational
and psychological instruments (International Test Commission, 2016;
available at www.intestcom.org). The guidelines fall into six broad
topics: pre-condition arrangements, test development, empirical
analyses, administration, score scales and interpretation, and
documentation. As we conclude our discussion on test adaptation,
and with the ITC guidelines in mind, we will consider the steps
involved in adapting measures.
The process of adapting an assessment measure involves the
following steps:
review construct equivalence in the languages and cultures of
interest decide whether test adaptation is the best strategy
choose a multidisciplinary team with expertise in the construct domain
of the measure, measurement and psychometrics, and the languages
and cultural groups for which the measure will be adapted
plan all the aspects that require adaptation. This might require
administering the original measure to a small sample from the target
group for the adaptation, and interviewing the test-takers to gain
insight into how they experienced the measure.
adapt the instrument using appropriate approaches and methods
review the adapted version and make necessary changes
conduct a small ‘tryout’ of the adapted version, and use the resultant
information to guide further refinement
administer the refined, adapted version to a large sample of test-
takers from the target population
place scores of both the adapted and the original measure on a
common scale
conduct a validation investigation, which includes investigations into
bias and equivalence
establish the psychometric properties of the adapted version generate
the norms, or set the performance standards or cut-scores, for the
adapted version
document the process and prepare the manual for test users
submit the adapted version to the Psychometrics Committee of the
Professional Board for Psychology for classification.

» CRITICAL THINKING CHALLENGE 7.2

Consider the steps involved in developing a psychological test that were


outlined in Chapter 6, Section 6.2. Is an essentially similar process followed
when adapting a measure? Are there any differences in the processes?

7.7 Multicultural test development


There are two main approaches followed when making measures
available for use in multicultural contexts, namely, a successive and a
simultaneous multicultural approach (Tanzer, 2005). The most popular
one is the successive approach. In successive test development, the
measure is developed for a monocultural/monolingual context. When
the measure proves to be useful, it is adapted for use in other
cultural/language groups. Using a successive approach to adapting
measures for use with multiple cultures and language groups has
been a key focus in this chapter. Examples of where the successive
approach has been used in South Africa are the adaptation of the
WAIS-III for South Africa (see Chapter 10), and the adaptation and
translation of the 16PF5 and the 15 Factor Questionnaire Plus
(15FQ+). While this approach is the one of choice – to adapt widely
used international measures for the South African context – the main
drawbacks are that the constructs measured in the original and
adapted versions might not be equivalent, and the way in which the
construct is operationalised might not be as relevant or appropriate for
the cultural group for which it is being adapted (Cheung & Fetvadjiev,
2016).
Tanzer (2005) makes a strong case for the advantages of using a
simultaneous multicultural approach when developing tests to be
used in multiple cultures and languages. As was discussed in Chapter
2, when this approach is used, a multicultural and multidisciplinary
panel or committee of experts is assembled. Panel members need to
have expertise in test development and psychometrics, in psychology
and in cross-cultural psychology in particular, in linguistics, in relation
to the cultures and language groups for whom the measure is being
developed, and in terms of the construct being assessed (e.g.
personality). While it is not necessary that each expert possesses all
this expertise, the panel or committee must collectively reflect this
expertise.
In the development of the South African Personality Inventory
(SAPI) (Nel at. al., 2012; Fetvadjiev, Meiring, Van de Vijver, Nel & Hill,
2015), a simultaneous multicultural approach was used. A core panel
of national and international experts in personality-test development
and cross-cultural psychology spearheaded the development of the
SAPI. In addition, Masters’ and doctoral students from various South
African cultural and language groups interacted closely with the core
panel to develop the data collection and analysis methods. Other
experts, such as linguists, sociologists and anthropologists, were
consulted and brought into the core panel at appropriate points in the
test-development process. The value of having such a diverse array of
experts involved in developing a measure such as the SAPI is that the
way in which the construct is operationalised, as well as the content of
the measure, is more likely to be appropriate for use in the multiple
cultures for which it is being developed than is the case with the
successive approach. A further interesting feature of the development
of the SAPI is that in the first phase of its development, ordinary adults
across the age and socio-economic spectrum, from rural and urban
areas, provided descriptions of personality, which were clustered and
categorised to uncover the personality dimensions for each of the 11
language groups in South Africa. This should further ensure that the
operationalisation of the personality construct should be appropriate
for the diverse cultural and language groups in South Africa.
Once the construct has been delineated in the simultaneous
approach, the test specifications need to be determined, and the items
need to be developed. Items can be developed by the expert panel or
committee. As the panel or committee includes language and cultural
experts they can ensure that the items do not contain culturally loaded
content or idiomatic expressions that will only be meaningful for one
cultural or language group. Another approach is to establish various
test-development teams, with each team having expertise in one of
the cultures and language groups for whom the test is being
developed. In this instance, careful planning is required. Solano-
Flores, Turnbull and Nelson-Barber (2002) report that in order for
multiple test-development teams to generate items that contribute to
the overall measure, ‘shells’ first need to be developed. A ‘shell’ is a
framework or a template that specifies the characteristics of ‘families’
or types or problems that makes it possible for the test-development
teams to generate items of a similar structure and complexity level.
Items developed in one language can then be translated using the
techniques described earlier in this chapter into the other languages in
which the measure is being developed. When the different language
versions are administered, they need to be researched to ensure that
equivalent constructs are being tapped across the various versions,
and that the reliabilities are acceptable.

ADVANCED READING

To know more about translating and adapting measures and developing


indigenous measures, read:
Cheung, F.M. & Fetvadjiev, V.H. 2016. Indigenous approaches to
testing and assessment. In F.T. Leong, D. Bartram, F.M. Cheung,
K.F. Geisinger & D. Iliescu (Eds.), The ITC handbook of testing and
assessment (pp. 333-346). New York, NY: Oxford University Press.
International Test Commission 2016. ITC guidelines for translating
and adapting tests (2nd ed.). Available at http://www.psyssa.com
Van de Vijver, F.J.R. (2016). Test adaptations. In F.T. Leong, D. Bartram, F.M.
Cheung, K.F. Geisinger, & D Iliescu (Eds.), The ITC handbook of testing and
assessment (pp. 364-376). New York, NY: Oxford University Press.

7.8 Conclusion
In the context of a multicultural society like South Africa, the
adaptation of measures and the detection and elimination of bias from
measures plays a vital role in the transformation process. To this end,
it is important that rigorous methods and approaches (such as those
outlined in this chapter) are used if the information obtained from
assessment measures is to be valid and reliable. As we pointed out in
Chapters 1 and 2, assessment practitioners must accept responsibility
for ensuring that the development and use of measures, and the
interpretation and reporting of test information, are non-discriminatory,
unbiased, and fair towards all South Africans.
And so … you have finished the long climb through the Foundation
Zone. You should give yourself a little check-up to see if you have
mastered the basic concepts in Chapters 1 to 7 that will be used in the
remainder of this book. The set of questions below will help you check
that you can move on with confidence to the Assessment Practice
Zone. If, however, you stumble over a few of these questions, do
some backtracking down the hill, just to familiarise yourself with some
of the more difficult concepts.

CHECKING YOUR PROGRESS 7.1


7.1 What is a psychological measure/test?
7.2 Why must a psychological measure be reliable?
7.3 What are the different forms of reliability that can be established?
7.4 Why must a psychological measure be valid?
7.5 What are the different types of validity that can be established?
7.6 Why are norms used?
7.7 What different types of norms can be established?
7.8 List the steps involved in developing a measure.
7.9 Why do we need to adapt measures?
7.10 What are the different aspects that must be considered when
adapting a measure for use in multilingual and multicultural contexts?
7.11 Discuss some of the key statistical approaches used to
investigate bias and equivalence.
7.12 What are the major pitfalls (historical and current) associated with using
psychological measures in the multicultural South African context?

The last question (7.12) sets the scene for the next zone of our
journey, and our next stop in particular.

* It is important to note that bias implies systematic variation in test scores due to one or a
number of influences, and not random variation (Cohen & Swerdlik, 2010).
Assessment Practice Zone
Now that we are out of the Foundation Zone, things are going to become a
little less theoretical and a little more practical or applied. In the
Assessment Practice Zone you will get a view of good assessment
practices in general, as well as the administration of measures in particular.

Part 8 The practice of psychological assessment.


9 The administration of psychological assessment measures.
2
The practice of psychological
assessment: Controlling the use
of measures, competing values,
and ethical practice standards
CHERYL FOXCROFT, GERT ROODT, AND FATIMA ABRAHAMS
Chapter CHAPTER OUTCOMES

8 In this chapter you will learn more about who is legally allowed to use
psychological assessment measures in South Africa, why the use of
assessment measures needs to be controlled, and the ethical
practice standards to which assessment practitioners should aspire.
Please note: In this chapter the term ‘use’ relates to whoever may
administer, score, interpret, and report on assessment measures and
the outcome of assessment, and not to other users of assessment
information such as human resource managers, teachers, and so on.
By the end of this chapter you will be able to:
› give reasons why the use of psychological measures should be controlled
› explain the statutory control of psychological assessment in South Africa
› list the different categories of assessment practitioners in South
Africa, indicate the types of measures they are allowed to use,
and discuss their training and registration requirements
› understand why and how psychological measures are classified
› describe good, ethical assessment practices, and the
roles and responsibilities of the various stakeholders.

8.1 Introduction
In this chapter we introduce you to the nature of the statutory control
of psychological measures in South Africa and the reasons why
measures are classified. We will outline the different types of
psychology professionals that may use measures and the purposes
for and/or contexts in which they may use measures. Finally we will
discuss fair and ethical assessment practices.

8.2Statutory control of the use of


psychological assessment measures
in South Africa
8.2.1 Why should the use of assessment
measures be controlled?
Assessment measures are potentially valuable tools that aid decision-
making, the planning of interventions and development initiatives, and
researching human behaviour, as we saw in Chapter 1. However, the
item content of some psychological assessment measures might tap
into very personal information. This might cause psychological trauma
to the individual being assessed if a professionally trained, caring
professional is not on hand to work through any traumatic reactions
during the course of assessment. Furthermore, feedback regarding
psychological assessment results needs to be conveyed to the person
being assessed in a caring, sensitive way, so that he or she does not
suffer any emotional or psychological trauma. We pointed out in
Chapter 1 that assessment measures and their results can be
misused, which can have negative consequences for those being
assessed. In view of the potentially sensitive nature of some of the
item content and the feedback, and given that assessment measures
can be misused, the use of assessment measures needs to be
controlled so that the public can be protected. Controlling the use of
psychological measures by restricting them to appropriately trained
professionals should ensure that:
the measures are administered by a qualified, competent assessment
practitioner, and that assessment results are correctly interpreted and
used
the outcome of the assessment is conveyed in a sensitive,
empowering manner rather than in a harmful way
the purchasing of psychological assessment measures is restricted to
those who may use them and test materials are kept securely (as it is
unethical for assessment practitioners to leave tests lying around) –
this will prevent unqualified people from gaining access to and using
them
test developers do not prematurely release assessment materials (e.g.
before validity and reliability have been adequately established), as it
is unethical for assessment practitioners to use measures for which
appropriate validity and reliability data have not been established
the general public does not become familiar with the test content, as
this would invalidate the measure. Should a measure be published in
the popular press, for example, it would not only give members of the
public insight into the test content, but could also give them a distorted
view of psychological assessment.

❱❱ CRITICAL THINKING CHALLENGE 8.1

As a psychologist, you find yourself testifying in court at a murder trial. You assessed
the person accused of the murder on a battery of measures, which included a
measure of general intelligence as well as a personality measure. While you are
testifying regarding your assessment findings, the judge asks you to show some of
the intelligence test items to the court so that they can get a more concrete idea of
what was assessed. How would you respond to this request? What factors would
you weigh up in your mind to help you to formulate your response?

8.2.2 How control over the use of psychological


assessment measures is exercised in South Africa
8.2.2.1 Statutory control
In South Africa, the use of psychological assessment measures is
under statutory control. What does this mean?
A law (statute) has been promulgated that restricts the use of
psychological assessment measures to appropriately registered
psychology professionals. According to the Regulations Defining the
Scope of the Profession of Psychology (Department of Health 2008;
as amended with the addition of an annexure, 2011), which are part of
the Health Professions Act (No. 56 of 1974), the following acts related
to assessment are defined in Section 2 (a) to (f) as ‘specially
pertaining to the profession of psychology’:
(a) the evaluation of behaviour or mental processes or personality
adjustments or adjustments of individuals or groups of persons,
through the interpretation of tests for the determination of
intellectual abilities, aptitude, interests, personality make-up or
personality functioning, and the diagnosis of personality and
emotional functions and mental functioning deficiencies according
to a recognised scientific system for the classification of mental
deficiencies
(b) the use of any method or practice aimed at aiding persons or
groups of persons in the adjustment of personality, emotional or
behavioural problems or at the promotion of positive personality
change, growth and development, and the identification and
evaluation of personality dynamics and personality functioning
according to psychological scientific methods
(c) the evaluation of emotional, behavioural and cognitive processes
or adjustment of personality of individuals or groups of persons by
the usage and interpretation of questionnaires, tests, projections
or other techniques or any apparatus, whether of South African
origin or imported, for the determination of intellectual abilities,
aptitude, personality make-up, personality functioning,
psychophysiological functioning or psychopathology
(d) the exercising of control over prescribed questionnaires or tests
or prescribed techniques, apparatus or instruments for the
determination of intellectual abilities, aptitude, personality make-
up, personality functioning, psychophysiological functioning or
psychopathology
(e) the development of and control over the development of
questionnaires, tests, techniques, apparatus or instruments for
the determination of intellectual abilities, aptitude, personality
make-up, personality functioning, psychophysiological functioning
or psychopathology.
(f) the use of any psychological questionnaire, test, prescribed
techniques, instrument, apparatus, device or similar method for
the determination of intellectual abilities, aptitude, personality
make-up, personality functioning, temperament,
psychophysiological functioning, psychopathology or personnel
career selection, and for this purpose the board will publish a
Board Notice listing the tests which are classified by the board for
use by registered psychologists.

Therefore, according to the Regulations Defining the Scope of the


Profession of Psychology as amended, and the Health Professions Act
(No. 56 of 1974), the use of measures to assess mental, cognitive,
or behavioural processes and functioning, intellectual or cognitive
ability or functioning, aptitude, interest, emotions, personality,
psychophysiological functioning, or psychopathology (abnormal
behaviour), constitutes an act that falls in the domain of the
psychology profession. Given that in everyday life you come across a
large array of tests (e.g. the test you take to get your learner’s
licence, questionnaires to determine preferences and interests of
consumers), how would you know which tests result in a
psychological act being performed? As stated in the regulations cited
above, the Professional Board has to publish a list of tests that have
been classified as being psychological tests and which can only be
used by psychology professionals.
Does this imply that only registered psychology professionals may
use psychological assessment measures? The answer is ‘No’. Within
the psychology profession, there are various categories of
professionals who may use psychological measures to varying extents
according to their scope of practice. Furthermore, with the permission
of the Professional Board for Psychology, various other professionals
(e.g. occupational therapists, speech therapists) may be permitted to
use certain psychological measures. Let us consider the different
categories of psychology professionals and their scope of practice in
greater detail.
8.2.2.2 The different categories of psychology professionals
who may use psychological measures
According to the Regulations Defining the Scope of the Profession of
Psychology as amended (Department of Health, 2011), the profession
of psychology in South Africa consists of psychologists (clinical,
counselling, educational, research, industrial, neuro-
psychologists, and forensic psychologists), registered
counsellors, and psychometrists (independent practice and
supervised practice). The category of psychotechnician was
previously part of this list, but it and the psychometrist (supervised
practice) category are being phased out: No new psychotechnicians or
psychometrists (supervised practice) will be added to the register and
those who were registered before the register was closed may
continue to practise in their capacity as a psychotechnician or
psychometrist (supervised practice). Please also note that the
registers for neuropsychologists and forensic psychologists will be
opened in due course as these are new professional psychology
categories in South Africa. Based on their relevant scope of practice,
the extent to which each of these categories of practitioners may use
psychological assessment measures, as well as the training
requirements of each, has been summarised in Table 8.1.
Table 8.1 Psychology professionals who may use psychological measures

ASPECT OF TEST USE RANGE OF WORK SETTING TRAINING


MEASURES AND AND BILLING REQUIREM
CONTEXT/NATURE
OF ASSESSMENT

PSYCHOTECHNICIAN Administer, score, Must work Bachelor’


and interpret under psycholo
standardised group supervision month in
or individual and may not pass a na
screening-type bill clients for set by Pro
tests. May not assessment. Board.
perform specialised
assessments.

REGISTERED Administer, score, Those B.Psych; 7


COUNSELLOR interpret, and currently (six mont
report on certain registered can supervise
measures (i.e. set up a practicum
measures trained in private 70% for n
and may not use practice and exam set
projective may bill Professio
personality clients.
measures, specialist
neuropsychological
measures, or
measures for
diagnosis of
psychopathology)
for the purposes of
psychological and
mental-health
screening with the
aim of referring to
appropriate
professional or to
develop
interventions to
enhance personal
functioning in
community
contexts.

PSYCHOMETRIST Administer, score, Can set up a B.Psych o


(independent interpret, and private with a ps
practice) report on measures practice and focus tha
(i.e. measures may bill clients 720 hour
trained in but may for supervise
not use projective assessment. practicum
personality psychom
measures, specialist (supervis
neuropsychological can work
measures, or psycholo
measures for three yea
diagnosis of hours of s
psychopathology) practical
for the purposes of 70% in a
assessing cognitive exam set
and personality Professio
functioning,
interests and
aptitudes in diverse
settings and
organisations.
Based on the
results, referrals can
be made to
appropriate
professionals for
more
comprehensive
assessment or
psychological
interventions. May
also contribute to
the development
of psychological
measures.

PSYCHOMETRIST Administer, score, Cannot B.Psych o


(supervised practice) and provisionally practice with a ps
interpret measures independently; focus; 720
under a must work practical
psychologist’s under a work; 70%
supervision psychologist. national e
(excludes Professio
projective and
specialist
measures). Can
participate in
feedback and co-
sign the report.

CLINICAL, Administer, score, Can work in Relevant


COUNSELLING AND interpret, and organisations Master’s d
EDUCATIONAL report on all and 12-month
PSYCHOLOGIST measures for which educational in approv
training has been contexts or set 70% for n
received; may up private exam set
perform specialised practice and Professio
assessments; and bill clients for mandato
may use the results assessment. continuin
for diagnostic professio
purposes with developm
clients who have
life challenges,
including
psychological
distress and
psychopathology
(clinical),
developmental and
well-being
problems
(counselling), and
to optimise
learning and
development in
educational
contexts
(educational).

RESEARCH Administer and Normally Relevant


PSYCHOLOGIST score measures (for works in Master’s d
which training has organisations 12-month
been received), and or undertakes in approv
use the results for contract 70% for n
research purposes. research. exam set
Develop Professio
psychological mandato
measures. continuin
professio
developm
******ebook converter DEMO Watermarks*******
INDUSTRIAL Administer, score, Normally Relevant
PSYCHOLOGIST interpret, and employed in Master’s d
report on all work settings. 12-month
measures for which in approv
training has been 70% for n
received to exam set
determine Professio
suitability for mandato
employment, continuin
training and professio
development; and developm
to determine the
effectiveness of
individuals, groups,
and organisations
in the work
environment. Can
develop
psychological
measures.

NEUROPSYCHOLOGIST Administer, score, Normally Relevant


interpret, and works in health Master’s d
report on all contexts (e.g. 12-month
psychological hospitals) or in approv
measures (for sets up private 70% for n
which training has practice and exam set
been received) for bills clients for Professio
the purposes of assessment. mandato
assessing, continuin
diagnosing, and professio
intervening in developm
psychological
disorders arising
from
neuropsychological
conditions.

FORENSIC Administer, score, Normally Relevant


PSYCHOLOGIST interpret, and works in a Master’s d
report on all private 12-month
psychological practice and in approv
measures (in which bills clients for 70% for n
training has been assessment. exam set
received) for the Professio
purposes of, mandato
among other continuin
things, providing professio
expert input, developm
diagnoses, and
psychological
explanations in
forensic/legal
cases.

Please consult the Regulations Defining the Scope of the


Profession of Psychology as amended (Department of Health, 2011)
for more detailed information on the scope of practice of the various
psychology professionals. You should also be aware that some
refinement and elaboration of the scope-of-practice regulations is
possible in the near future. For example, it is likely that the scope of
practice for educational psychologists will be refined and the
publication of a list of psychological tests that registered counsellors
may be trained to use and the purposes for which they can be used is
under consideration. You should thus follow the developments related
to the scope of practice of the various categories of psychology
professionals and how this might impact on the nature of the
assessment that they can perform by consulting the Web site of the
Health Professions Council of South Africa (http://www.hpcsa.org.za).
In Chapter 1, we pointed out that, as far as integrating and
synthesising assessment information in particular are concerned, the
assessment practitioner has to draw on all his or her knowledge and
experience in the field of psychology and not just experience in
assessment. Therefore, all the psychology professionals described in
Table 8.1, apart from completing modules in assessment, have to be
in possession of a degree in which they studied psychology in depth
as well as related disciplines (e.g. statistics). This provides them with
a broader knowledge base on which to draw during the assessment
process.
Furthermore, in addition to the Health Professions Act (No. 56 of
1974) and its regulations, the Employment Equity Act (No. 55 of
1998), the Protection of Personal Information Act (No. 4 of 2013), and
the Promotion of Access to Information Act (No. 2 of 2000) also apply
generally to assessment, and should be complied with.

❱❱ ETHICAL DILEMMA CASE STUDY 8.1

Benni is a psychometrist (supervised practice) who works for a psychologist.


Benni’s friend approaches him with a problem. She needs to have her son’s
intellectual functioning assessed and a report needs to be forwarded to his
school. The problem, however, is that she cannot afford to pay the psychologist’s
fees. She thus asks Benni to do the assessment on her son as a favour and to
send a report on the assessment to the school. Advise Benni regarding whether
or not he should agree to his friend’s request. Tell him what the implications
would be if he performed the assessment and sent a report to the school on his
own, without the knowledge of the psychologist who employs him.

8.2.2.3 Other professionals who use psychological measures


Speech and occupational therapists use some assessment measures
that tap psychological constructs (e.g. measures of verbal ability or
visual-motor functioning). You will remember from Section 8.2.2.1, that
we looked at what is legally considered to be a psychological act. By
administering measures that tap psychological constructs, speech and
occupational therapists are performing psychological acts that fall
outside of the scope of their profession. Nevertheless, these
measures are essential tools that assist them in making appropriate
diagnoses and developing interventions. A practical solution to this
problem thus had to be found. Fortunately, through the process of test
classification, which we will discuss in the next section, the conditions
under which a measure may be used can be relaxed. The
Professional Boards for Speech Therapy and Occupational Therapy
thus interact with the Psychometrics Committee of the Professional
Board for Psychology to obtain lists of measures that may tap
psychological constructs that they use. The relevant boards and the
Psychometrics Committee discuss these lists and reach agreement on
the prescribed list of tests for the various professionals, as well as the
nature of the psychometrics and assessment training that trainees
should receive. It is also possible that other professionals (e.g. social
workers, paediatricians) might use measures in the course of their
work that might tap psychological constructs. A similar process of
engagement as described above will then have to be entered into with
such professionals to see whether, under specified conditions, the
Professional Board for Psychology will make it possible for them to
use psychological measures.
8.2.2.4 The classification and evaluation of psychological measures
Classification is a process whereby a decision is made regarding the
nature of a measure and, consequently, who may use it. There are
two main reasons why measures need to be classified in South Africa.
Firstly, as there are many assessment measures, of which only some
qualify as being psychological assessment measures, measures
have to be subjected to a classification process to determine whether
or not they should be classified as a psychological measure.
Secondly, as you saw in Section 8.2.2.2, various categories of
psychology professionals may use psychological measures to varying
extents, and certain other professionals can also be permitted to use
psychological measures in the course of their assessment (see
Section 8.2.2.3). The other purpose of classifying measures is
therefore to determine which categories of practitioners may
administer, score, interpret, and report on a particular measure.
Who undertakes the classification of measures in South Africa? In
Chapter 2, we pointed out to you that the Psychometrics Committee
of the Professional Board has been tasked with psychological
assessment matters, including the classification of psychological
measures, among others. The Psychometrics Committee only
assumed this function in 1996. Prior to this, the Test Commission of
the Republic of South Africa (TCRSA) undertook the classification of
measures on behalf of the Professional Board for Psychology.
According to the system of classification used by the TCRSA,
measures were classified as ‘not a psychological test’, an A-level test
(achievement measures), a B-level test (group tests of ability,
aptitude, and interest), or a C-level test (individually administered
intelligence and personality-related measures). The classification
decisions made by the TCRSA were based on the content domain
tapped by the measure, the nature of the administration and scoring,
and the extent to which the person being assessed could suffer
psychological trauma. After a few years of adopting the same system,
the Psychometrics Committee found that not only was the ABC
classification no longer in widespread use internationally, but also that
it unnecessarily complicated both practitioners’ and the public’s
perception of what a psychological measure is. The Psychometrics
Committee then developed a simplified classification system whereby
a measure is simply labelled as either a psychological measure or not.
How is it determined whether a measure is a ‘psychological
measure’? According to the Psychometrics Committee’s Guidelines
for Reviewers, ‘the simple rule of thumb is that a test is classified as
being a psychological test when its use results in the performance of a
psychological act’ (Professional Board for Psychology, 2010d, p. 1). It
is therefore not important whether a measure is of South African origin
or is imported/adapted; is administered individually, in a group context,
or via a computer; whether or not the test instructions are highly
structured; whether the scoring procedure is objective, requires
psychological judgement, or is computerised; whether the
interpretation is straightforward, requires professional expertise, or is
generated via computer. The main criterion is whether or not use of
the measure results in the performance of a psychological act as the
measure assesses a psychological construct.
The next aspect considered during the classification process is the
fact that the test content and the outcome of the assessment might
have a negative impact on the person being assessed (see Section
8.2.1). For this reason, the process of classifying a measure also
takes into consideration whether the nature of the content of the
measure and the results from it may have negative psychological
consequences for the individual. This will imply that the administration
and feedback should be handled in a supportive, sensitive way which,
in turn, assists in the determination of the category of practitioner who
may use the measure.
A final, yet crucial aspect of the current classification system is that
the test is subjected to a review or evaluation to ensure that it meets
the minimum requirements, especially in terms of its psychometric
properties and suitability for use in the multicultural South African
context. Consult Form 208 (Professional Board for Psychology,
2010b) to see what aspects are covered in the review. The matter of
the review of psychological and educational measures, and how to
conduct such reviews, is being keenly debated in South Africa and
across the world at present, especially in terms of whether there
should be a generic review system used by all countries (Bartram,
2012; Evers, 2012). While there are pros and cons in this regard,
Bartram (2012) points out that the test-review system developed by
the European Federation of Psychologists’ Associations (EFPA),
which represents 35 European countries, ‘represents the first attempt
at internationalisation’ (p. 200). Given the widespread adoption of the
EFPA system of test reviews, the Psychometrics Committee is
exploring whether it should be adopted for use in South Africa.
There has been some criticism that the South African classification
system is too simple and does not recognise that a psychological
measure could be used in different contexts for different purposes by
different psychology professionals. For example, a certain personality
questionnaire could be used by a registered counsellor to provide
career guidance or by an industrial psychologist to determine if an
employee should be placed in a management development
programme. The Psychometrics Committee are thus developing a
multilevel classification system in which an indication will be provided
for each measure regarding what aspects of its use and in what
context it may be used by the various categories of psychology
professionals. Visit the Web site of the Professional Board for
Psychology (http://www.hpcsa.org.za) to keep track of developments
in this regard.
Further criticisms of the classification and evaluation system in
South Africa relate to the review of tests submitted for classification.
One criticism is that the test reviews done during the classification
process are not published. Evers, Sijtsma, Lucassen and Meijer
(2010) argue that the ‘publication of test ratings is an effective way of
providing information on test quality to test users, test authors, and
others involved in the field of testing’ (p. 311). Another criticism relates
to whether the classification process must include the review of a test.
It is essential for ethical practice that psychological measures are of
sufficient quality and are psychometrically sound. But instead of
conducting a review of the measure, it might be better if the
Professional Board for Psychology publishes a regulation on the
standards or requirements that psychological measures must meet.
Such a regulation could be a useful resource for:
test publishers to assess whether their measures meet the standards
required
psychological assessment practitioners to evaluate whether the
measure that they want to use meets the requirements. This is
important, as it is ultimately the assessment practitioner’s
responsibility to choose psychometrically sound, appropriate, and
relevant measures to use
organisations to see if the measures that they are using meet the
standards so that they cannot be accused of the unethical use of
psychological assessment
clients who could ask a psychological assessment practitioner
whether the psychological measure used met the requirements
educational institutions when training new assessment practitioners so
that they know the standards that assessment measures must meet
• researchers to ensure that they use valid and reliable measures in
their study of human behaviour.
The fact that the Professional Board for Psychology has not yet
published regulations on the standards or requirements that
psychological measures must meet was identified as being
problematic in a recent court judgement (ATP v. the President, the
Minister of Labour, and the HPCSA, 2017). The Psychometrics
Committee has been mandated by the Professional Board for
Psychology to draft such a regulation on the standards for
psychological measures. In the light of this, consideration might be
given to abolishing the review of psychological measures as part of
the test classification process. You can follow the developments
related to regulations on test standards and whether the classification
process continues to include a review of the measure by consulting
the Web site of the Health Professions Council of South Africa
(http://www.hpcsa.org.za).
Who submits measures for classification and how does the
process work? Test developers should, as a matter of course, send
any new measures to the Psychometrics Committee to be classified.
The Psychometrics Committee also has the right to request a test
developer or publisher to submit a measure for classification. Figure
8.1 provides an overview of the process to classify psychological
measures.
One final point about classification: Many people, particularly test
developers and publishers, are concerned that the process of
classifying a measure as a psychological measure restricts its use to
psychology professionals only, which thereby robs them of their
livelihood. However, classification of a measure by the Psychometrics
Committee does not impose any new restrictions on it. It is the Health
Professions Act (No. 56 of 1974) that restricts the use of a measure
that taps psychological constructs to the domain of psychology. In
fact, the classification process can allow for the relaxing of conditions
under which a measure can be used, thus making it possible for an
assessment practitioner other than a psychologist to use it.
Consequently, the classification process could actually result in the
measure being used more freely than it would have been otherwise.

Figure 8.1 Test classification process

8.2.2.5 The Professional Board for Psychology and the


protection of the public
As we pointed out in Chapter 1, a main function of the Professional
Board for Psychology is to protect the public. In view of the statutory
regulation that restricts the use of psychological measures to
appropriately registered assessment practitioners, the public’s
interests are served in two ways.
Firstly, members of the public may contact the Professional Board
directly if they feel that assessment practitioners have misused
assessment measures or have treated them unfairly or
unprofessionally during the course of assessment. The Professional
Board will then contact the assessment practitioner to respond to the
complaint. Thereafter, the Professional Board will decide whether or
not to convene a disciplinary hearing. Should an assessment
practitioner be found guilty of professional misconduct, he or she
could be fined, suspended, or even struck off the roll, depending on
the severity of the offence. Assessment practitioners are well advised
to take out malpractice insurance as members of the public are
increasingly becoming aware of their right to lodge a complaint, which
may or may not be valid, but which, nevertheless, needs to be
thoroughly investigated and which may involve costly legal counsel.
Secondly, the Professional Board serves the interests of the public
by laying down training and professional practice guidelines and
standards for assessment practitioners. Competency-based training
guidelines for the various categories of psychology professionals
(which include assessment competencies) have been developed and
are published on the Web site of the Professional Board. Training
institutions are regularly inspected to ascertain whether they are
adhering to the training guidelines. As regards ethical practice
standards, the Professional Board has adopted the International Test
Commission’s International Guidelines for Test-use (Version 2000)
and released the Ethical Rules of Conduct for Professionals
Registered under the Health Professions Act, 1974 (Department of
Health, 2006). In the next section, we will focus on ethical, fair
assessment practices and standards.

8.3 Fair and ethical assessment practices


8.3.1 What constitutes fair and ethical assessment practices?
Based on the International Test Commission’s International Guidelines
for Test-use (Version 2000), one could define fair assessment
practices as those that involve:
the appropriate, fair, professional, and ethical use of assessment
measures and assessment results
taking into account the needs and rights of those involved in the
assessment process
ensuring that the assessment conducted closely matches the purpose
to which the assessment results will be put
• taking into account the broader social, cultural, and political context
in which assessment is used and the ways in which such factors
might affect assessment results, their interpretation, and the use to
which they are put.
By implication, to achieve the fair assessment practices outlined
above, assessment practitioners need to:
have adequate knowledge and understanding of psychometrics,
testing, and assessment, which informs and underpins the process of
testing
be familiar with professional and ethical standards of good
assessment practice that affect the way in which the process of
assessment is carried out and the way in which assessment
practitioners treat others involved in the assessment process
have appropriate knowledge and skills regarding the specific
measures they use
have appropriate contextual knowledge and skills, which include
knowledge of how social, cultural, and educational factors impinge on
test scores and how the influence of such factors can be minimised;
knowledge regarding appropriate selection of measures in specific
contexts; knowledge about policies regarding the use of assessment
and assessment results
have appropriate interpersonal skills to establish rapport with test-
takers and put them at ease, to maintain the interest and cooperation
of the test-takers during the test administration, and to provide
feedback of the assessment results in a meaningful, understandable
way
have oral and written communication skills that will enable them to
provide the test instructions clearly, write meaningful reports, and
interact with relevant others (e.g. a selection panel) as regards the
outcome of the assessment.

8.3.2 Why assessment practitioners need to ensure


that their assessment practices are ethical
Simply put, an assessment context consists of the person being
assessed and the assessment practitioner. However, viewed from
another vantage point, there are, besides the test-taker and the
assessment practitioner, other role players with a vested interest in
the assessment process, such as organisations, educational
institutions, parents, and test developers, publishers, and marketers.
We will first focus on developing an understanding of the reason why
the assessment practitioner needs to adhere to ethical assessment
practices, and what the rights and responsibilities of test-takers are. In
Section 8.4, we will highlight the responsibilities of some of the other
role players as regards fair assessment practices.
The relationship between the person being assessed and the
assessment practitioner in many ways reflects a power relationship.
No matter how well test-takers have been prepared for the
assessment process, there will always be an imbalance in power
between the parties concerned where assessment results are used to
guide selection, placement, training, and intervention decisions.
Assessment practitioners hold a great deal of power as they have first-
hand knowledge of the assessment measures and will directly or
indirectly contribute to the decisions made on the basis of the
assessment results. Although test-takers hold some power in terms of
the extent to which they perform to their best ability, there is a definite
possibility that test-takers can perceive the assessment situation to be
disempowering. It is precisely due to the power that assessment
practitioners have over test-takers that assessment practitioners
should ensure that this power is not abused and that ‘the client’s
fundamental human rights’ (paragraph 10(1) of the Ethical Rules,
Department of Health, 2006) are protected by using fair or ethical
assessment practices. What are some of the important things that
psychological assessment practitioners should do to ensure that their
assessment practices are professional and ethical?

8.3.3 Professional practices that assessment


practitioners should follow
The good assessment practices that assessment practitioners should
follow in every assessment context include:
informing test-takers about their rights and the use to which the
assessment information will be put
obtaining the consent of test-takers to assess them, to use the results
for selection, placement, or training decisions and, if needs be, to
report the results to relevant third parties. Where a minor child is
assessed, consent must be obtained from the parent or legal
guardian. Consult paragraph 11 in the Ethical Rules (Department of
Health, 2006) for a fuller discussion on informed consent
treating test-takers courteously, respectfully, and in an impartial
manner, regardless of culture, language, gender, age, disability, and
so on
being thoroughly prepared for the assessment session
maintaining confidentiality to the extent that it is appropriate for fair
assessment practices. See Chapter 3 of the Ethical Rules
(Department of Health, 2006) for information on the importance of
maintaining confidentiality, exceptions to this requirement, and
instances in which psychology professionals can disclose confidential
information
establishing what language would be appropriate and fair to use
during the assessment, and making use of bilingual assessment
where appropriate
only using measures that they have been trained to
use administering measures properly
scoring the measures correctly and using appropriate norms or cut-
points or comparative profiles
taking background factors into account when interpreting test
performance and when forming an overall picture of the test-taker’s
performance (profile)
communicating the assessment results clearly to appropriate parties
acknowledging the subjective nature of the assessment process by
realising that the final decision that they reach, while based at times
on quantitative test information, reflects their ‘best-guess estimate’
using assessment information in a fair, unbiased manner and ensuring
that anyone else who has access to this information also does so
researching the appropriateness of the measures that they use and
refining, adapting, or replacing them where this is necessary
• securely storing and controlling access to assessment materials so
that the integrity of the measures cannot be threatened in any way.
Some of the above can be captured in a contract. Contracts between
assessment practitioners and test-takers are often implicit. It is,
however, important that contracts should be more explicitly stated in
writing, and that test-takers should be thoroughly familiarised with the
roles and responsibilities of the assessment practitioners, as well as
what is expected of the test-takers themselves. By making the roles
and responsibilities clear, the possibility of misunderstandings and
costly litigation will be minimised.

8.3.4 Rights and responsibilities of test-takers


According to Fremer (1997), test-takers have the right to:
be informed of their rights and responsibilities
be treated with courtesy, respect, and impartiality
be assessed on appropriate measures that meet the required
professional standards
be informed prior to the assessment regarding the purpose and nature
of the assessment
be informed whether the assessment results will be reported to them,
as well as the use to which the assessment results will be put
be assessed by an appropriately trained, competent assessment
practitioner
know whether they can refuse to be assessed and what the
consequences of their refusal may be
• know who will have access to their assessment results and the
extent to which confidentiality will be ensured (see also Chapter
16).
According to the International Test Commission’s International
Guidelines for Test-use (Version 2000), test-takers have the
responsibility to:
read and/or listen carefully to their rights and responsibilities
treat the assessment practitioner with courtesy and respect
ask questions prior to and during the assessment session if they are
uncertain about what is required of them, or about what purpose the
assessment serves, or how the results will be used
inform the assessment practitioner of anything within themselves (e.g.
that they have a headache) or in the assessment environment (e.g.
noise) that might invalidate their assessment results
follow the assessment instructions carefully
• represent themselves honestly and try their best during the
assessment session.
The International Test Commission (2010) has also developed A Test-
takers Guide to Technology-based Testing. This will be discussed in
Chapter 14.

8.3.5 Preparing test-takers


Among the important characteristics of South African test-takers that
can potentially impact negatively on test performance is the extent to
which they are ‘test-wise’ (Nell, 1997). Given the differential
educational experiences that South African people were exposed to in
the past, ‘taking a test’ is not necessarily something that is within the
frame of reference of all South Africans. Therefore, if we wish to
employ fair assessment practices to provide all test-takers with an
equal opportunity to perform to the best of their ability on assessment
measures, we have to prepare test-takers more thoroughly prior to
assessing them. By having either practice examples, completed
under supervision, for each measure in the battery, or practice tests
available, test-takers can be prepared more thoroughly, not only for
the nature of the assessment questions, but also for what to expect in
the assessment context in general. Where computerised testing is
used, test-takers need to be familiarised with the computer keyboard
and/or mouse and need to be given the opportunity to become
comfortable with a computer before they are expected to take a test
on it (see Chapter 14).
Preparing all test-takers prior to a test session should not be
confused with the related, yet somewhat different, concept of
coaching. A great amount of literature has been produced on
coaching and the positive impact it has on certain test scores,
especially for some achievement (especially mathematics) and
aptitude tests that have ‘coachable’ item types (Gregory, 2010).
Coaching includes providing extra practice on items and tests similar
to the ‘real’ tests, giving tips on good test-taking strategies, and
providing a review of fundamental concepts for achievement tests.
Coaching, however, has drawbacks: it not only invalidates test scores
as they are no longer a true representation of a test-taker’s level of
functioning, but, because coaching is often done by private companies
outside of the assessment context, it is unaffordable for less privileged
test-takers. The solution to the problems brought about by coaching is
to make self-study coaching available to all test-takers so that
everyone is given the opportunity to familiarise themselves with the
test materials.

8.3.6 Ethical dilemmas


Despite all the available ethical rules and guidelines, there are still
many grey areas that pose ethical dilemmas for psychology
professionals. De Bod (2011) generated a range of ethical dilemmas
by interviewing 10 industrial psychologists. Some of the dilemmas
identified were related to the professional context (e.g. scope of
practice, appropriateness of measures), ethical values (e.g.
confidentiality, transparency and fairness), and contextual pressures
(e.g. equity considerations when performing employment testing).
Levin and Buckett (2011) argue that opportunities and forums need to
be provided for psychology professionals to grapple with and
‘discourse around ethical challenges’ (p. 2). It is for this reason that we
have included a range of case studies in this book that focus on
common ethical dilemmas in assessment. We hope that you engage
your classmates in discussion when responding to the case studies.
8.4Multiple constituents and competing
values in the practice of assessment
8.4.1 Multiple constituency model
We pointed out in Section 8.2.2 that the parties who play a role in the
psychological assessment context include:
the person being assessed (or the test-taker)
the assessment practitioner (as well as a psychologist, should the
assessment practitioner not be a psychologist)
other parties with a vested interest in the process and outcome of the
assessment (such as an organisation, human resources managers
and practitioners, parents, labour unions, government departments,
the Professional Board for Psychology, schools, and teachers)
• the developers, publishers, and marketers of measures.
The assessment context, then, consists of a complex set of
interrelated parties who might have different needs and values
regarding the necessity for and the outcome of the assessment. The
relationship between the role players involved in the assessment
context is illustrated in Figure 8.2 in terms of a model that may be
aptly named a multiple-constituency model of psychological
assessment. This model was adapted from Schmidt (1997), who
adapted the model from the competing-values model of organisational
effectiveness developed by Quinn and Rohrbaugh (1981).

8.4.2 Competing values: An example from industry


The utility value of psychological assessment for the multiple
constituents involved in the assessment context is largely dictated by
their own goals, needs, and values. By way of example, Table 8.2
provides information about some of the more important needs and
values of different parties when assessment is performed in industry
or occupational settings. This example is used as the Employment
Equity Act (No. 55 of 1998) has led to the growing realisation of the
large number of role players who have a stake in the use of
assessment measures and assessment results in industry.
Figure 8.2 A multiple-constituency model of psychological assessment (adapted from
Schmidt, 1997).

You will realise from Table 8.2 that there is some overlap between
the goals and values of the role players in assessment contexts in
industry, but perhaps for different underlying reasons (motives). For
example, all role players involved in assessment value accurate, valid,
and reliable assessment. However, assessment practitioners value
valid assessment measures as the use of such measures helps them
to pursue good practice standards; test developers and publishers, on
the other hand, value valid assessment measures because
assessment practitioners are more likely to buy such measures.
Table 8.2 The utility value of psychological assessment for stakeholders/role players in industry

ROLE DIRECTING GUIDING VALUES UNDERLYING


PLAYERS GOAL MOTIVE

Assessment Generating Professionalism. Professional and


practitioners, valid and Ethical conduct. ethical conduct.
psychologists, reliable Unbiased
HR managers assessment decisions.
information. Promoting
fairness and
equity. Enhancing
the contribution
of assessment to
company
performance.

Assessment, Making valid Assessment must Understanding


or business, and reliable be practical (easy and improving
or employer decisions to administer), individual or
Organisations regarding and economical organisational
selection, job (as regards costs functioning,
placement, and time). Fair performance, or
and training. assessment development.
practices must be
followed.

Test-taker To present an To be treated To gain


accurate fairly. To be given employment, or
picture of the best chance promotion, or to
themselves. of performing to get an
their capabilities. opportunity to
further their
development.

Unions Pursuit of Fairness. Equity. Enhancing


non- Unbiased fairness and
discriminatory decisions. equity.
personnel
decisions.

Professional Serving the Fairness. Protecting the


Board, interests of Equitable public and
Government the public. assessment promoting the
practices. Non- well-being of all
discriminatory citizens.
practices. Setting
standards.
Controlling
practices.

Developers, Developing Scientific and Developing


publishers, valid and theoretical empirically sound
and reliable integrity in the measures. Selling
marketers of measures. development of assessment
measures measures. measures to
Obtaining and make a living.
keeping their
share of the
market.

Given their differing values and motives, tensions might potentially


arise between the constituents, for example:
Test developers may sell measures that are suspect with regard to
their validity and reliability, and this will anger assessment
practitioners and organisations. On the other hand, assessment
practitioners may use measures for inappropriate purposes and then
claim that the measures are invalid, which will irritate the test
developers.
On the one hand, test-takers may claim that they were not informed
about the purposes of the assessment, while on the other hand
assessment practitioners may suggest that test-takers are intentionally
trying to discredit the assessment.
• Assessment practitioners may claim that Government regulations
are forcing them to employ incompetent personnel, thus affecting
their organisation’s profitability, while the government may retaliate
by claiming that organisations are practising discriminatory
employment practices.

❱❱ ETHICAL DILEMMA CASE STUDY 8.2

Dr Phumzile Mthembu, our specialist consultant and selection expert, used


learning potential and managerial aptitude scores to predict performance in
an accelerated management-development programme (page back to
Ethical Dilemma Case Studies 3.1 and 4.1 to remind yourself about this
case). Test-takers became aware of their own rights, and also the
obligations of the assessment practitioner, and felt that the assessment
practitioner violated their rights. They consequently lodged a complaint with
the union who formally filed charges against the company for using
discriminatory selection practices. The company in turn filed charges
against Dr Mthembu, who again blamed the test developer and distributor
for not divulging full information about the tests he imported from overseas.

Questions
(a) Who is responsible for this situation, and why?
(b) In retrospect, what could have been done to avoid such a situation?

You will find more information on fair assessment practices, issues,


and competing values when assessing children, people with physical
or mental disabilities, and information on the organisational, forensic,
psychodiagnostic, and educational contexts in Chapter 15.

8.4.3 Responsibilities of organisations as


regards fair assessment practices
In Section 8.2 we looked at the responsibility of the assessment
practitioner in ensuring that ethical assessment practices are followed.
Increasingly, attention is also being focused on the responsibility of
organisations as regards fair assessment practices.
The organisation inter alia has the responsibility of ensuring that:
it has an assessment policy in place that reflects fair, ethical
practices
it employs assessment practitioners who are competent, who have
been appropriately trained in the specific measures that they need to
use, and who are mentored and supervised where necessary
(especially when they are inexperienced)
valid assessment measures are used for appropriate purposes
assessment results are used in a non-discriminatory manner
it has support mechanisms in place to assist assessment practitioners
to build a research database that can be used to establish the fairness
and efficacy of the measures used and the decisions made
• it regularly monitors the extent to which its assessment policy is being
put into effect ‘on the ground’ and revises it where necessary.
Which aspects should an assessment policy cover? The International
Test Commission’s International Guidelines for Test-use Version 2000
(p. 26, see www.intestcom.org) suggests the following:

A policy on testing will need to cover most, if not all, the following issues:
• proper test use
• security of materials and scores
• who can administer tests, score, and interpret tests
• qualification requirements for those who will use the tests
• test-user training
• test-taker preparation
• access to materials and security
• access to test results and test-score confidentiality issues
• feedback of results to test-takers
• responsibility to test-takers before, during, and after test session
• responsibilities and accountability of each individual user.

If an organisation adopts an assessment policy, an individual should


be given the authority to see to it that the policy is implemented and
adhered to. Normally, one would expect this person to be a qualified,
experienced assessment practitioner who probably has at least
managerial level status in the organisation.
❱❱ CRITICAL THINKING CHALLENGE 8.2

You are working as a psychologist in a large organisation. You tested all the
trainee managers in your organisation when you were developing and testing
items for a new leadership test a year ago. As a result of this testing, the
content of the test was considerably refined as it was found that many of the
original items had questionable validity and reliability. The human resources
director now approaches you with the following request: ‘Please forward me
the leadership scores for each of the trainee managers whom you tested a
year ago, as we want to appoint the ones with the most leadership potential as
managers now.’ Write a memo to the human resources director in response to
this request. Motivate your response clearly.

CHECK YOUR PROGRESS 8.1


8.1 Explain the following statement: ‘Psychological assessment in
South Africa is under statutory control’.
8.2 Explain why psychological tests need to be kept securely.
8.3 Explain why organisations should have an assessment policy,
and the key aspects that such a policy should contain.
8.4 List the professionals who may use psychological tests in South Africa as well
as their qualifications and the restrictions placed on their use of the tests.
8.5 Tell your friend in your own words why it is so important that
assessment practitioners follow fair and ethical practices.
8.6 What is an ‘ethical dilemma’? Provide a few examples of
ethical dilemmas in testing and assessment. (Hint: read the
case studies in this book to get some ideas.)

8.5 Conclusion
You have now completed your first stop in the Assessment Practice
Zone. In this chapter you have learned why psychological assessment
is controlled by law and which categories of assessment practitioners
may use such measures. You have learned further why and how
psychological assessment measures are classified. You have been
introduced to ethical assessment practices for psychology
professionals. In the process you have probably realised that there are
many different aspects to consider when you are trying to develop an
understanding of what constitutes good and fair assessment practices.
Administering psychological
assessment measures
LOURA GRIESSEL
Chapter CHAPTER OUTCOMES

9 No matter how carefully an assessment measure is constructed, the results


will be worthless unless it is administered properly. The necessity of
following established procedures or guidelines for administering and
scoring psychological and educational assessment measures is one of the
hallmarks of sound psychological measurement. During this stop on our
journey through the Assessment Practice Zone, we will explore what an
assessment practitioner should do to ensure that a measure is
administered in the optimal way so as to achieve valid and reliable results.
By the end of this chapter, you will be able to:
› understand the importance of administering psychological
assessment measures correctly
› describe the assessment practitioner’s duties before
administering a psychological assessment measure
› describe the main duties of an assessment practitioner
during an assessment session
› describe the assessment practitioner’s duties after the assessment session
› understand the assessment challenges related to assessing
young children and people with disabilities.

9.1 Introduction
The basic rationale of assessment involves generalising the sample of
behaviour observed in the testing situation to behaviour manifested in
other non-test situations. A test score should help to predict, for
example, how the test-taker will achieve or how he or she will perform
on the job. However, assessment measurement scores are influenced
by many factors. Any influences that are specific to the test situation
constitute error variance and may reduce test validity. Moreover, when
scores are obtained, there is a tendency to regard the observed score
as being representative of the true ability or trait that is being
measured. In this regard, the concepts of reliability and measurement
error are mostly concerned with random sources of error that may
influence test scores. In the actual application of assessment
measures, many other sources of error should be considered,
including the effects of test-taker characteristics, the
characteristics of the assessment practitioner, and the testing
situation (see also Chapter 17).
The procedure to be followed in administering an assessment
measure depends not only on the kind of assessment measure
required (individual or group, timed or non-timed, cognitive or
affective), but also on the characteristics of the people to be examined
(chronological age, education, cultural background, physical and
mental status). Whatever the type of assessment measure, factors
related to the individuals, such as the extent to which the test-takers
are prepared for the assessment and their level of motivation, anxiety,
fatigue, health, and test wiseness, can affect performance (see also
Chapter 17).
The skill, personality, and behaviour of the assessment practitioner
can also affect test performance. Administration of most assessment
measures requires that the assessment practitioner be formally
accredited by the Health Professions Council of South Africa. Such
requirements help to ensure that assessment practitioners possess
the requisite knowledge and skills to administer psychological
assessment measures.
Situational variables, including the time and place of assessment,
and environmental conditions such as noise level, temperature, and
illumination, may also contribute to the motivation, concentration, and
performance of test-takers.
In order to systematically negotiate these variables and the role
they play in administering assessment measures, we will discuss the
administering of psychological assessment measures in three
stages. Firstly, we will discuss the variables that the assessment
practitioner has to take into account prior to the assessment session;
secondly, the practitioner’s duties during the assessment session;
and finally, the duties of the assessment practitioner after the
assessment session (see Figure 9.1). Our discussion will deal mainly
with the common rationale of test administration rather than with
specific questions of implementation.
As we progress through our discussion, the reader would be well
advised to keep in mind that the key ethical testing practices that
assessment practitioners should follow and adhere to are highlighted
in the International Guidelines for Test-use developed by the
International Test Commission (ITC, 2001). The Guidelines can be
found on the ITC’s Web site (www.intestcom.org).
Against the internationally accepted backdrop of the core ethical
administration procedures, the discussion will also focus on the
administration of assessment measures in a multicultural South Africa.

Figure 9.1 Stages in administering psychological assessment measures

Preparation prior to the


9.2
assessment session
In the following sections we will highlight the most important aspects
that should be taken care of during the various stages of administering
psychological measures. Good assessment practices require the
assessment practitioner to prepare thoroughly for an assessment
session.

9.2.1 Selecting measures to include in a test battery


Various factors influence the measures that the assessment
practitioner chooses to include in a test battery. Among these are: the
purpose of the assessment (e.g. selection, psychodiagnostic,
neuropsychological), the competencies or constructs that need to be
assessed, and the types of decisions that need to be made based on
the outcome of the assessment
demographic characteristics of the test-taker (e.g. age, culture,
education level, home language)
whether the test-taker has a mental disability; when assessing
mentally challenged individuals, it is important to choose a measuring
instrument that fits the mental age range of that person
whether the test-taker is differently-abled (e.g. is partially sighted or
deaf), and consequently the assessment needs to be adapted and
tailored
the amount of time available to perform the assessment
the psychometric properties (validity and reliability) of available
measures and the suitability of such measures for the particular test-
taker or the purpose for which the assessment is being conducted
whether the assessment practitioner has the necessary competencies
to administer the measures selected and whether he or she is
permitted to use these measures given the relevant scope of practice
for the practitioner’s professional category (see Chapter 8).

Research has indicated that neurological deficits, which include motor


dysfunction, cognitive decline, and behavioural changes, may occur in up to 50
per cent of people with HIV/Aids. In order to identify neuropsychological functions
most affected, an assessment battery needs to be constructed that would be
sensitive towards early detection of deficiencies. Based on previous research, an
assessment battery would include some of the following measures:
Depression Scale (Hamilton Depression Scale or Beck
Depression Scale)
The HIV virus often compromises mood and emotional status.
Depression could also be the result of an emotional response to the
disease and could affect any other testing. For this reason, a measure
to assess depression needs to be included in such a battery.
Trail-making Test
This assessment measure assesses motor functioning, attention, processing of
information, and visual perception. It is especially the first three functions that are
compromised in HIV/Aids-related neuropsychological impairment.
Digit-symbol Coding Subtest of the WAIS-IV
A characteristic of neuropsychological impairment in the disease
process is the reduced speed of information processing. The Digit-
symbol Coding subtest assesses psychomotor and processing speed.
The WAIS-IV is an international test that has been adapted for use in
South Africa, with norms available for the ages 16 through to 89.
Rey Osterrieth Complex Figure Test (ROCF)
Mental slowing is characteristic of neuropsychological impairment, and, in
later stages of the dementing process, affects both memory functions and
the ability to learn new information. To enable an examiner to assess these
abilities, this measure should be included in such a test battery.
Folstein’s Mini Mental Status Examination
This is a screening test to assess cognitive functioning. The test assesses
special and temporal orientation, concentration, memory, comprehension, and
motor functions. It identifies signs of impairment, but does not provide a
diagnosis. This assessment measure is the most widely used brief screening
measure for dementia and can be used on a large variety of population groups.
Adult Neuropsychological Questionnaire
This measure was initially designed to screen for brain dysfunction.
The items are designed to obtain information regarding subjectively
experienced neuropsychological functioning reported by the patient.
As it is a screening measure, no normative data is available.
Responses obtained are classified into domains representative
of possible neuropsychological deficits.
By using a basic battery, trends in neuropsychological functioning
can be identified, which could facilitate a better understanding of the
deficits. This could, in turn, assist employees, caretakers, and the
persons living with HIV/Aids to manage the illness through its various
stages. This battery can be extended or reduced depending on the
condition of the patient as well as the reason for such an assessment.
Examples of the types of measures to include in a test battery for
specific purposes (e.g. selection of managers) or for assessing
specific psychological, neuropsychological, and chronic disorders are
available and assessment practitioners could consult these to get an
idea of what measures they could include in the battery that they must
compile. The block above provides an example of a test battery to
assess the neuropsychological functioning of people with HIV/Aids
that was compiled on the basis of research.

❱❱ ETHICAL CONSIDERATIONS

The core ethical responsibility during testing and assessment is that nothing
should be done to harm the client. This implies that the test administrator
should be sensitive to test-takers’ cultural backgrounds and values.
In this regard it is important to note that the impact of certain types of
tests (i.e. self-report questionnaires and self-reflection tests, speed versus
power tests) and general familiarity with the laws governing testing could
differ in a multicultural South Africa. Expand on this viewpoint.

Source: Foxcroft, C.D. 2011a. Ethical issues related to psychological testing in Africa:
What I have learned (so far). Online Readings in Psychology and Culture, 2(2).
Retrieved from http://scholarworks.gvsu.edu/orpc/vol2/iss2/7.
https://doi.org/10.9707/2307-0919.1022

ADVANCED READING
Foxcroft, C.D. 2011a. Ethical issues related to psychological testing in Africa: What I
have learned (so far). Online Readings in Psychology and Culture, 2(2). Retrieved
from http://scholarworks.gvsu.edu/orpc/vol2/iss2/7.
https://doi.org/10.9707/2307-0919.1022
Paterson, H., & Uys, K. 2005. Critical issues in psychological test use in the South
African workplace. SA Journal of Industrial Psychology, 31(3), pp. 12–22.

9.2.2 Checking assessment materials and equipment


It is advisable to make a list of the number of booklets, answer
sheets, pencils, and other materials required, so that it will be easy to
check beforehand whether the correct materials and the correct
quantities are available. It is also advisable to have approximately 10
per cent more than the required quantity of the materials at hand, in
case extra material is needed for one reason or another (a poorly
printed page in a book, pencil points that break, and so on).
The assessment booklets and answer sheets should be checked
for any mistakes (e.g. misprints, missing pages, blank pages) as well
as for pencil marks that might have been made by previous test-takers
so that they can be erased or the booklet can be replaced.
If an individual with a disability is to be assessed, assessment
practitioners should consult someone who has experience in working
with people who have such a disability to find out whether the test
materials need to be adapted in any way. For example, for a partially
sighted test-taker, the test instructions and verbal items that the test-
taker must read need to be presented using a large-size font or with
the help of a ‘text-to-voice’ mechanism.
In the case of computer-based and Internet-delivered test
administration, among other things, the assessment practitioner
must ensure that the computers, tablets or smartphones that will be
used conform to the required specifications, and check that once the
test has been loaded, it functions correctly (i.e. that the way that the
test is displayed on the screen is correct).

9.2.3 Becoming familiar with assessment measures


and instructions
Advance preparation for the assessment sessions concerns many
aspects. However, becoming thoroughly familiar with the general
instructions, as well as the specific instructions for the various
subsections of the measure, is one of the most important aspects.
The assessment practitioner should ensure that he or she knows
all aspects of the material to be used. He or she should know which
different forms of the assessment measure will be used, how the
materials should be distributed, where the instructions for each
subtest or measure in the assessment battery can be located in the
manual or on the computer screen, and the sequence of the subtests
or measures that comprise the assessment battery.
Memorising the exact verbal instructions is essential in most
individual assessments. Even in a group assessment in which the
instructions are read to the test-takers, some prior familiarity with the
statements to be read prevents misreading and hesitation, and
permits a more natural, informal manner during assessment
administration.

9.2.4 Checking that assessment conditions


will be satisfactory
The assessment practitioner should ensure that seating, lighting,
ventilation, temperature, noise level, and other physical conditions in
the assessment venue and surroundings are appropriate. In
administering either an individual or a group assessment measure,
special provisions may have to be made for physically challenged or
physically different (e.g. left-handed) test-takers.
9.2.4.1 Assessment rooms
The choice of the assessment room is important and should comply
with certain basic requirements. The room should be in a quiet area
that is relatively free from disturbances. An ‘Assessment in progress –
Do not disturb’ sign on the closed door of the assessment room may
help to eliminate interruptions and other distractions.
The room should be well-lit and properly ventilated, since
physical conditions can affect the achievement of the test-takers.
There should be a seat available for every test-taker as well as an
adequate writing surface. Where young children are being assessed, it
will be necessary to have small tables and small chairs.
Where group testing is conducted, the size of the assessment
room should be such that every test-taker can hear the assessment
practitioner without difficulty, and can see him or her and any writing
apparatus he or she might use – blackboards, flip charts, and so on –
without interference. Large rooms are not suitable for group
assessment unless assessment assistants are used. In large rooms
where there are several people together, test-takers are often
unwilling to ask questions concerning unclear instructions or problems
they experience. The availability of assistants can help to alleviate
this problem. As a guide, there should be one assessment
practitioner/assessment assistant per 25 to 30 test-takers. There
should be sufficient space around the writing desks so that the
assessment practitioner and assistants can move about freely and
reach each test-taker to answer questions or to check their work,
particularly during the practical exercises, so as to ensure that they
have understood and are following the instructions correctly. There
should also be enough space between the test-takers to ensure that
they cannot disturb one another, and to eliminate cheating.
Telephones should be taken off the hook or should be diverted to
the switchboard. Test-takers should be asked to switch off their cell
phones, and assessment practitioners and assessment assistants
should do likewise.
As wall charts, sketches or apparatus may sometimes help or
influence test-takers, there should be no aids on the walls or in the
room that have any relation to the content of the assessment
measure.
While all of the above are considered basic requirements for an
assessment venue, Foxcroft (2011a, p. 6) adds the dimension of
‘where to test’. Normally, testing takes place at the assessment
practitioners’ rooms, classrooms, e-assessment laboratories, or in
training or assessment rooms in organisations − most of which are in
urban areas. For test-takers who are brought to the city from rural
areas to be tested, such test venues may be very foreign, alienating,
and anxiety-provoking, which can have a negative impact on test
performance. Foxcroft (2011a) argues that it would be better to assess
such test-takers in more familiar surroundings in their village. Schools
and community halls, for instance, could be used for assessment
purposes.
9.2.4.2 Minimising cheating during group assessment and using assessment
assistants
There are various ways of minimising cheating during group
assessments, including the following: seating arrangements (e.g.
leave an adequate space between test-takers); preparing multiple
forms of the assessment measure and distributing different forms to
adjacent test-takers; and using multiple answer sheets, that is,
answer sheets that have different layouts. The presence of several
roving assessment assistants further discourages cheating and
unruliness during assessment. These assistants, who should be
suitably trained beforehand, can also:
provide extra assessment material when necessary, after assessment
has started
answer individuals’ questions about instructions that are not clearly
understood
• exercise control over test-takers; for example, ensure that pages
are turned at the correct time, see that the test-takers stop when
instructed to, ensure that they adhere to the instructions and that
dishonesty does not occur.
Assessment assistants are best equipped to help if they have studied
the assessment measure and the instructions, and have also had the
opportunity to apply them. They should be informed in advance of
possible problems that may be encountered when the assessment
measures are administered.
It is very important that assessment assistants should be well
informed about what will be expected of them, so that there will be no
uncertainty about their role in relation to that of the assessment
practitioner in the assessment situation. The assessment practitioner
is ultimately in charge of the assessment situation. However, the
assessment practitioner’s attitude towards the assessment assistants
should be such that the latter do not get the impression that they are
fulfilling only a minor routine function without any responsibility.
Therefore, through his or her conduct, the assessment practitioner
should always make assist-ants aware of the importance of their
tasks.
Under no circumstances should an assessment practitioner and
his or her assessment assistants stand around talking during the
assessment session, as this will disturb the test-takers. An
assessment practitioner should never stand behind a test-taker and
look over his or her shoulder at the test booklet or answer sheet. This
kind of behaviour could make some test-takers feel very
uncomfortable or even irritable.

9.2.5 Personal circumstances of the test-taker


and the timing of assessment
The activities that test-takers are engaged in preceding the
assessment situation may have a critical impact on their
performance during assessment, especially when such activities have
led to emotional upheaval, fatigue, or other perturbing conditions.
Studies have found, for example, that when military recruits are
assessed shortly after intake – that is, after a period of intense
adjustment due to unfamiliar and stressful situations – their scores are
significantly lower than when assessed later, once their situations
have become less stressful and more familiar (Gordon & Alf, 1960).
Studies have also suggested that when students and children are
subjected to depressing emotional experiences, this impacts
negatively on their assessment performance. For example, when
children and students are asked to write about the worst experience in
their lives versus the best thing that ever happened to them, they
score lower on assessment measures after describing a depressing
emotional experience than when they describe more enjoyable
experiences (McCarthy, 1944).
A person’s physical well-being at the time of assessment is very
important. If a child has a head cold, or is suffering from an allergy
(e.g. hay fever), he or she will find it very difficult to concentrate and
perform to the best of his or her capabilities on the assessment
measures. In such instances it would be wise to reschedule the
assessment when the person being assessed feels better. It is well
known that medication can impact on levels of alertness (e.g. creating
drowsiness), as well as cognitive and motor functioning. The
assessment practitioner thus needs to know beforehand whether the
test-taker is on any medication and how this may impact on test
performance.
The time of day when an assessment session is scheduled is also
very important. Young children, elderly people, and those who have
sustained head injuries often tire easily and should thus be assessed
fairly early in a day, before other activities have sapped their energy.
For example, if a child attends a pre-school all morning, and he or she
is only assessed in the afternoon, the assessment practitioner will
have great difficulty eliciting cooperation, and being irritable, the child
won’t give of his or her best.
Where a person has an emotional, behavioural, or neurological
condition or disorder, the timing of assessment is critical if valid
assessment data is to be obtained. The motor and mental functions of
a person suffering from depression may be slower because of the
depression. A hyperactive child may be so active and distractible that
it is impossible for him or her to complete the assessment tasks. In
such instances, it is better to first treat the mood or behavioural
disorder before the individual is assessed, as the presence of the
disorder will contaminate the assessment results. In cases where a
person has sustained a head injury or suffered a stroke
(cerebrovascular accident), comprehensive assessment immediately
after the trauma is not advised, as there can be a rapid change in
brain functioning during the acute phase following the brain trauma.
Comprehensive assessment of such individuals should only take
place three to six months after the trauma, when their condition has
stabilised.
When the assessment is being performed in a multicultural
context, ‘you have to acquire knowledge of the test-taker in relation to
his or her cultural, family, linguistic, educational, and socio-economic
background and heritage’ (Foxcroft, 2011a, p. 3). There are many
things that assessment practitioners can do to ‘immerse themselves in
the world of their test-taker’ (Foxcroft, 2011a, p. 3). For example, to
gain insight into your test-taker’s context, you could use community
genograms (Ivey, 1995; Ivey, Ivey & Simek-Morgan, 1997), in which a
visual representation of the test-taker’s family and community are
generated, and positive stories are narrated.
The purpose of the assessment can also impact on
performance. Where the assessment is for high-stakes purposes (e.g.
admission to a university, pre-employment assessment) and the
results have direct personal implications (Barry, Horst, Finney, Brown
& Kopp, 2010) test-takers are likely to be more tense and anxious.
Assessment practitioners thus need a thorough knowledge of
the individuals whom they assess, prior to assessing them. In so
doing, the assessment practitioner is ‘adopting an emic approach in
which human behavior is examined using criteria related to a specific
culture as opposed to using behavioral criteria that are presumed to
be universal (etic approach)’ (Foxcroft, 2011a, p. 3). This is a key step
in authentically employing an African-centred approach to
assessment. The information gathered to gain an understanding of the
test-taker’s context further helps assessment practitioners to plan
whether and when to assess, what intervention might be required
before assessment can take place, aspects to be sensitive to in the
testing process, and what to consider when interpreting test results in
a contextually sensitive way.

❱❱ ETHICAL CONSIDERATIONS

What is, in your opinion, the main ethical challenge that assessment
practitioners face in Africa?
Source: Foxcroft, C.D. 2011a. Ethical issues related to psychological testing in Africa:
What I have learned (so far). Online Readings in Psychology and Culture, 2(2).
Retrieved from http://scholarworks.gvsu.edu/orpc/vol2/iss2/7.
https://doi.org/10.9707/2307-0919.1022

9.2.6 Planning the sequence of assessment


measures and the length of assessment sessions
Where a battery of measures is to be administered, care should be
taken to sequence the measures in an appropriate way. A measure
that is relatively easy and non-threatening is usually administered
first. This will help the test-takers to settle in and not feel
overwhelmed at the beginning of the assessment. It will also give
them the opportunity to gain more self-confidence before having to
face the difficult tasks. Having successfully negotiated the first
measure, test-takers often relax and appear to be less anxious.
Measures that require intense concentration, complex reasoning,
and problem-solving are usually placed towards the middle of an
assessment battery. Placing them at the beginning of an assessment
session could be very threatening to the test-takers. Placing them at
the end, when they are tired, could result in the assessment
practitioner not being able to elicit the best possible effort and
performance from the test-takers. The final measure in the battery
should also be a relatively easy, non-threatening measure, which
paves the way for the test-takers to leave the assessment session on
a positive note.
The length of an assessment session should be planned
thoroughly in advance. The length depends mainly on the level of
development and mental and physical status of the test-takers.
Care should be taken, however, to ensure that the length of the
assessment session is planned in such a way that fatigue resulting
from too many assessment measures in succession does not play a
detrimental role in performance.
The assessment period should seldom be longer than 45 to 60
minutes for preschool and primary school children, and 90 minutes
for secondary school learners, as this corresponds to the period of
time that they can remain attentive to assessment tasks.
Concentration may be a problem for many children and adults with
chronic illnesses (e.g. epilepsy, asthma, diabetes, cardiac problems).
This needs to be considered during assessment by providing sufficient
breaks and by having reasonably short assessment sessions. More
than one session may thus be required for administering longer
assessment batteries to children, as well as to adults with emotional or
neurological disorders, a mental disability or a chronic illness.
9.2.7 Planning how to address linguistic factors
South Africa is a multilingual country. Ideally, one would argue that
every test-taker has a right to be assessed in his or her home (first)
language. However, as was discussed in Chapter 7, various
complications arise when trying to put this ideal into practice in South
Africa. The most important among these are that:
for the foreseeable future the majority of measures will still be
available in English and Afrikaans as it will take a number of years to
adapt and translate measures into the different African languages
(see Chapter 7, Section 7.4)
• many South Africans have been or are being educated in their
second or third language, either by choice or as a result of
apartheid educational policies of the past. Particularly in instances
where a measure taps previously learned knowledge (e.g. certain
subtests of intelligence measures), it may be fairer to assess test-
takers in the language medium in which they were or are being
educated, instead of in their first language, or to perform bilingual
assessment.
In view of the dilemma faced by assessment practitioners, the
following guidelines based on Annexure 12 of the Ethical Rules of
Conduct for Professionals Registered under the Health Professions
Act, 1974 (Department of Health, 2006) could be useful:
A measure should only be administered by an assessment practitioner
who possesses a sufficient level of proficiency in the language in
which it is being administered.
A test-taker should be assessed in a language in which he or she is
sufficiently proficient. The assessment practitioner thus firstly has to
determine the test-taker’s proficiency in the language in which the
measure will be administered.
If a measure is administered in a test-taker’s second or third
language, the assessment process should be designed in such a way
that threats to the reliability and validity of the measures are
minimised. In this regard, the assessment practitioner (with the
assistance of a trained interpreter if needs be) could make use of
bilingual communication when giving test instructions, so as to ensure
that the instructions are understood and the best possible
performance is elicited. For example, the instructions could be given in
English. The test-taker could then be asked to explain what they are
expected to do. If there is any confusion, the instructions could be
repeated in the home language of the test-taker, via an interpreter if
necessary.
When a trained interpreter is used, the assessment practitioner must
get consent from the client to use the interpreter and must ensure that
the confidentiality of the test results is maintained and that the security
of the test is not threatened (paragraph 5(a), Annexure 12 of the
Ethical Rules, Department of Health, 2006).
• When a trained interpreter is used and where a scientifically verified
adaptation and or translation of the test is not available, it is critical
that the assessment practitioner is aware of the limitations of the test
results and their subsequent interpretation (paragraph 5(b), Annexure
12 of the Ethical Rules, Department of Health, 2006). These
limitations should be pointed out when reporting the test results, the
conclusions reached, and the recommendations made.
Prior to an assessment session, assessment practitioners thus need
to carefully consider in which language the assessment should be
conducted. Where necessary, an interpreter or assessment assistant
will need to be trained to give the instructions, if the assessment
practitioner is not proficient in that language.

❱❱ CRITICAL THINKING CHALLENGE 9.1

Lerato Molopo is 19 years old and completed Grade 12. She hails
from the rural areas of KwaZulu-Natal. Lerato wants to study at a
university and is applying for a bursary from the organisation where
you are employed as a psychometrist. She is referred for evaluation.
Critically reflect on the factors that would have to be taken into account
to make the assessment fair and ethical from a language perspective.
9.2.8 Planning how to address test sophistication
The effects of test sophistication (test-wiseness), or sheer test-
taking practice, are also relevant in this connection. In studies with
alternate forms of the same test, there is a tendency for the second
score to be higher. The implication is that if a test-taker possesses
test-sophistication, and especially if the assessment measure
contains susceptible items, the combination of these two factors can
result in an improved score; in contrast, a test-taker low in test-
sophistication will tend to be penalised every time he or she takes a
test that includes test-wise components. Individuals lacking exposure
to specific test materials or test formats may be at a disadvantage.
Brescia and Fortune (1989) have reported that children from a
Native American culture may not exhibit behaviours conducive to
successful test-taking due to a lack of experience. This includes not
reading tests items accurately or not having opportunities to practise
tasks required by the test. Test scores are therefore meaningless if
the test-taker has not had exposure to an optimal educational
environment (Willig, 1988) associated with specific test-taking. A test-
taker who has had prior experience in taking a standardised test thus
enjoys a certain advantage in test performance over one who is taking
his or her first test.
It has been proven that short orientation and practice sessions
can be quite effective in equalising test sophistication. Such
familiarisation training reduces the effects of prior differences in test-
taking experiences. Issues of test sophistication become imperative in
a multicultural and diverse South Africa where individuals may have
had differing experiences with standardised testing procedures.
Orientation sessions should be distinguished from coaching sessions
(Anastasi & Urbani, 1997).

❱❱ ETHICAL CONSIDERATIONS

It is well known that the extent to which a test-taker is ‘test-wise’ has a


significant impact on test performance. Discuss the following statement:
‘With a person with no test-taking experience, it is ethical to carefully consider
whether it is even desirable to administer a test, and in such cases, other forms of
assessment and obtaining information should also be considered.’

Source: Foxcroft, C.D. 2011a. Ethical issues related to psychological testing in Africa:
What I have learned (so far). Online Readings in Psychology and Culture, 2(2).
Retrieved from http://scholarworks.gvsu.edu/orpc/vol2/iss2/7.
https://doi.org/10.9707/2307-0919.1022

9.2.9 Informed consent


Test-takers should be informed well in advance about when and
where the assessment measure is to be administered, what sort of
material it contains, and what it will be assessing. Test-takers deserve
an opportunity to prepare intellectually, emotionally, and physically for
assessment. Furthermore, by providing test-takers with sufficient
information about the assessment, they are in a better position to
judge whether they want to consent to assessment.
Informed consent is an agreement made by an agency or
professional with a particular person, or his or her legal representative,
to permit the administration of a psychological assessment measure
and/or obtain other information for evaluative or psychodiagnostic
purposes. The nature of the agreement reached should be captured
in writing, usually by making use of a pre-designed consent form.
The consent form usually specifies the purposes of the assessment,
the uses to which the results will be put, the test-taker’s, parents’, or
guardians’ rights, and the procedures for obtaining a copy of the final
report, as well as whether oral feedback will be provided.
Obtaining informed consent is not always a straightforward matter
and it may pose some challenges to the assessment practitioner. For
example, from whom should consent be obtained if the child to be
assessed lives with his or her grandparents in an urban area while his
or her parents work on a farm in a rural community? When a child is
the head of the household, as both parents have died as a result of of
HIV/Aids, can the child give consent for a younger sibling to be
assessed? How do you go about obtaining informed consent from
people living in deep rural areas to whom the concept of taking a test
may be completely unknown? Readers are referred to Foxcroft
(2011a, 2017) for a discussion on issues related to informed consent
within an African context.

Figure 9.2 Preparation prior to the assessment session

Figure 9.2 provides a summary of the assessment practitioner’s


duties prior to the assessment session.

❱❱ ETHICAL CONSIDERATIONS

Obtaining informed consent could present certain challenges for assessment


practitioners who work in the African context. Reflect critically on these challenges.
Source: Foxcroft, C.D. 2011a. Ethical issues related to psychological testing in Africa:
What I have learned (so far). Online Readings in Psychology and Culture, 2(2).
Retrieved from http://scholarworks.gvsu.edu/orpc/vol2/iss2/7.
https://doi.org/10.9707/2307-0919.1022.

ADVANCED READING: INFORMED CONSENT

If you read paragraph 46 (c) of Annexure 12 of the Ethical Rules of


Conduct for Professionals Registered under the Health Professions Act,
1974 (Department of Health, 2006) you will see that there are three
instances in which informed consent is not necessary. What are these?
Informed consent is not necessary when psychological evaluation is a legal
requirement; where it needs to be established whether a person’s mental
capacity is diminished; and for routinely conduced assessment in educational
and organisational settings (e.g. pre-employment assessment). Even in these
instances, the psychology professional still has the responsibility to explain the
nature and purpose of the assessment to the client.
Here’s something for you to think about. Why do you think that in the three
instances identified in the Ethical Rules, informed consent is not necessary?

Source: Levin, M.M., & Buckett, A. 2011. Discourses regarding ethical challenges in
assessment: Insights through a novel approach. SA Journal of
Industrial Psychology, 37(1), pp. 251–263.

❱❱ CRITICAL THINKING CHALLENGE 9.2

Thembi Dlamini, a psychologist, is preparing to administer a self-report personality


questionnaire to a group of Grade 9 learners in a secondary school. There are 45
learners in the class. Thembi requests someone to assist her, but her request is
turned down by the assessment manager on the grounds that it would be too costly
to pay someone to assist her. The assessment manager tells her that the class is
small enough for her to be able to handle the administration of the questionnaire on
her own. Identify the ethical dilemma that Thembi faces and
suggest what she can do to ensure that she conducts the
assessment in an ethically responsible way.

The assessment practitioner’s duties


9.3
during assessment administration
9.3.1 The relationship between the assessment
practitioner and the test-taker
9.3.1.1 Adopting a scientific attitude
The assessment practitioner should adopt an impartial, scientific,
and professional attitude when administering an assessment
measure. It sometimes happens that an assessment practitioner
wants to help some individuals, or adopts a different attitude towards
them because he or she knows them and unconsciously urges them
to do better. Under no circumstances should this be allowed to
happen. Indeed, it is the assessment practitioner’s duty to ensure that
every test-taker does his or her best, but he or she may not assist
anyone by means of encouraging facial expressions, gestures, or by
adding words of encouragement to the instructions.
The race of the assessment practitioner, as well as the effects it
might have, have generated considerable attention due to concern
about bias and impartiality. However, Sattler (1988) found that the
effects of assessment practitioner’s race are negligible. One of the
reasons so few studies show effects of the assessment practitioner’s
race on results is that the procedures for properly administering
measures are so strict and specific. Thus, assessor (examiner)
effects might be expected to be greatest when assessors are given
more discretion in how they use assessment measures. Even though
most standardised assessment measures have a very strict
administration procedure, there are numerous ways in which
assessment practitioners can communicate a hostile or friendly
atmosphere, or a hurried or relaxed manner. These effects may not be
a function of the race of the assessment practitioner per se but may
be reflected in strained relationships between members of different
groups.
9.3.1.2 Exercising control over groups during group assessment
The assessment practitioner should exercise proper control over the
assessment group. The test-takers should obey the test instructions
explicitly. They should respond immediately and uniformly. Discipline
should be maintained throughout, but in such a way that test-takers
still feel free to ask questions about aspects that are not clear to them.
As the assessment practitioner is in command, he or she alone gives
instructions or answers questions, unless assessment assistants are
being used. The assessment practitioner should issue an instruction
only when every test-taker is able to comply with it. He or she should
wait until an instruction has been carried out by everyone before
giving the next one and should ensure that everyone in the group is
listening and paying attention before giving an instruction.
9.3.1.3 Motivating test-takers
An important aspect in the administration of an assessment measure
is the test-takers’ motivation to do their best. To motivate test-takers
positively is, however, not easy, since no direct reward can be offered
to them for good achievement and no procedure can be prescribed as
to how they can best be motivated. Furthermore, no single approach
can be used to ensure that each test-taker will be equally motivated.
The assessment practitioner needs to learn from experience which
techniques or approaches will ensure the greatest degree of
cooperation and motivation on the test-takers’ part. When poorly
motivated, test-takers soon reveal symptoms of listlessness and lack
of attention.
One way of motivating test-takers is to ensure that they will benefit
in one way or another from the assessment. The point of departure
should be that the assessment is being conducted for the sake/benefit
of the individual. If test-takers know that the results of psychological
assessment will yield something positive, they will be better motivated
to do their best. Where assessment measures are used for the
purpose of selecting the most suitable candidate for employment, the
candidates are usually self-motivated, simplifying the assessment
practitioner’s task.
Assessment practitioners should be aware of the effects of
expectancy and of reinforcing responses when motivating test-
takers. A well-known line of research in psychology has shown that
data obtained in experiments sometimes can be affected by what an
experimenter expects to find (Rosenthal & Rosnow, 1991). This is
called the Rosenthal effect. Even though studies on expectancy
effects are often inconsistent, it is important to pay careful attention to
the potentially biasing effect of expectancy. The effect of
reinforcement on behaviour is also well-known. Several studies have
shown that reward can have an important effect on test performance
(Koller & Kaplan, 1978). As a result of this potency, it is very important
to administer tests under controlled conditions.
9.3.1.4 Establishing rapport
In assessment administration, rapport refers to the assessment
practitioner’s efforts to arouse the test-takers’ interest in the
assessment measure, to elicit their cooperation, and to encourage
them to respond in a manner appropriate to the objectives of the
assessment measure. In ability assessment, this objective calls for
careful concentration during the test tasks so that optimal performance
can be achieved. In self-report personality measures, the objective
calls for frank and honest responses to questions about one’s usual
behaviour, while in certain projective techniques it calls for full
reporting of associations evoked by the stimuli, without any censoring
or editing of content. Other kinds of assessment may require other
approaches. But, in all instances, the assessment practitioner must
endeavour to motivate the test-takers to follow the instructions as fully
and conscientiously as they can. Normally, the test manual provides
guidelines for establishing rapport. These guidelines standardise this
aspect of the assessment process, which is essential if comparable
results are to be obtained from one test-taker to the next. Any
deviation from the standard suggestions for establishing rapport
should be noted and taken into account in interpreting performance.
In general, test-takers understand instructions better, and are
better motivated, if the assessment practitioner gives instructions
fluently, without error, and with confidence. A thorough knowledge of
the instructions is conducive to self-confident behaviour on the part of
the test-taker. Uncertainty causes a disturbing and uncomfortable
atmosphere. The consequences of a poor knowledge of instructions
and a lack of self-confidence can give rise to unreliable and invalid
data, as well as a decrease in rapport between the assessment
practitioner and the test-taker or group of test-takers.
An assessment practitioner who does not have a good relationship
with test-takers will find that they do not pay attention to the
instructions, become discouraged easily when difficult problems are
encountered, become restless, and criticise the measures. A good
relationship can be fostered largely through the calm, unhurried
conduct of the assessor.
Although rapport can be more fully established in individual
assessment, specific steps can also be taken in group assessment to
motivate test-takers and relieve their anxiety. Specific techniques for
establishing rapport vary according to the nature of the assessment
measure, and the age and characteristics of the people to be
assessed. For example, preschoolers tend to be shy in the presence
of strangers, are more distractible, and are less likely to want to
participate in an assessment session. For this reason, assessment
practitioners should not initially be too demonstrative but should rather
allow for a ‘warming up’ period. Assessment periods should be brief
and contain varied activities that are introduced as ‘games’ to arouse
the child’s curiosity. The game approach can usually also be used to
motivate children in the early grades of school. Older school-age
children can be motivated by appealing to their competitive spirits as
well as their general desire to do well on tests.
When assessing individuals from an educationally
disadvantaged background, the assessment practitioner cannot
assume that they will be motivated to excel at test tasks that might
appear to be academic in nature. Special rapport and motivational
problems may also be encountered when assessing prisoners or
juvenile delinquents. Such persons are likely to manifest
unfavourable attitudes such as suspicion, fear, insecurity, or cynical
indifference, especially when examined in institutions.
Negative past experiences of test-takers may also adversely
influence their motivation to be assessed. As a result of early failures
and frustrations in school, for example, test-takers may have
developed feelings of hostility and inferiority toward academic tasks,
which psychological assessment measures often resemble. The
experienced assessment practitioner will make a special effort to
establish rapport under these conditions. In any event, he or she must
be sensitive to these special difficulties and take them into account
when interpreting and explaining assessment performance.
When assessing individuals, one should bear in mind that every
assessment measure presents a potential threat to the individual’s
prestige. Some reassurance should therefore be given from the
outset. For example, it is helpful to explain that no one is expected to
finish or to get all the items correct. The test-taker might otherwise
experience a mounting sense of failure as he or she advances to the
more difficult items or is unable to finish any subsection within the time
allowed.
Adult assessment, which is most often used in industry, presents
additional problems. Unlike the school child, the adult is less likely to
work hard at a task without being convinced that it is necessary to do
so. Therefore, it becomes more important to sell the purpose of the
assessment to the adult. The cooperation of the adult test-taker can
usually be secured by convincing them that it is in their own interests
to obtain a valid score – that is, to obtain a score that accurately
indicates what they can do, rather than one that overestimates or
underestimates their abilities. Most people will understand that an
incorrect decision, which might result from invalid assessment scores,
could mean subsequent failure, loss of time, and frustration for them.
This approach can serve not only to motivate test-takers to try their
best on ability measures, but can also reduce faking and encourage
frank reporting on personality measures, as test-takers realise that
they would be the losers otherwise. It is certainly not in the best
interests of individuals to be admitted to a career or job where they
lack the prerequisite skills and knowledge, or are assigned to a job
they cannot perform, based on inaccurate assessment results.
Assessment results may thus be influenced by the assessment
practitioner’s behaviour during the assessment. For example,
controlled investigations have yielded significant differences in
intelligence assessment performance as a result of a ‘warm’ versus a
‘cold’ interpersonal relation between the assessment practitioner and
the test-taker, or a rigid or aloof versus a natural manner on the part of
the assessment practitioner. Moreover, there may be significant
interactions between assessment practitioner and test-taker
characteristics, in the sense that the same assessment practitioner
characteristics or manner may have a different effect on different test-
takers as a function of the test-taker’s own personality characteristics.
Similar interactions may occur with task variables, such as the nature
of the assessment measure, the purpose of assessment, and the
instructions given to test-takers.

9.3.2 Dealing with assessment anxiety


Assessment anxiety on the part of the test-takers may have a
detrimental effect on performance. During assessment, many of the
practices designed to enhance rapport also serve to reduce
anxiety. Procedures that dispel surprise and strangeness from the
assessment situation, and which reassure and encourage the test-
taker, will help to lower anxiety. The assessment practitioner’s own
manner, as well as how well-organised he or she is and how smoothly
the assessment proceeds, will contribute towards the same goal.
Studies indicate that achievement and intelligence scores yield
significant negative correlations with assessment anxiety (Sarason,
1961). Of course, such findings do not indicate the direction of causal
relationships. It is possible that students develop assessment anxiety
because they have experienced failure and frustration in previous
assessment situations. It appears likely that the relation between
anxiety and assessment performance is non-linear in that a slight
amount of anxiety is beneficial, while a large amount is detrimental. It
is undoubtedly true that chronically high anxiety levels exert a
detrimental effect on school learning and intellectual development.
Anxiety interferes with both the acquisition and the retrieval of
information. However, it should also be mentioned that studies have
called into question common stereotypes of the test-anxious student
who knows the subject matter but freezes when taking an
assessment. This research has found that students who score high on
an assessment anxiety scale obtain lower grade-point averages and
tend to have poorer study habits than those who score low on
assessment anxiety (Hill & Sarason, 1966).
Research on the nature, measurement, and treatment of
assessment anxiety continues at an ever-increasing pace. With regard
to the nature of assessment anxiety, two important components have
been identified: emotionality, and concern. The emotionality
component comprises feelings and physiological reactions, such as
tension and increased heartbeat. The concern or cognitive component
includes negative self-orientated thoughts, such as an expectation of
doing poorly, and concerns about the consequences of failure. These
thoughts draw attention away from the task-orientated behaviour
required by the assessment measure and thus disrupt performance.
Considerable effort has been devoted to the development and
evaluation of methods to treat assessment anxiety. These include
behaviour therapy, which focuses on procedures for reducing the
emotional component of assessment anxiety. The results generally
have been positive, although it is difficult to attribute the improvement
to any particular technique as a result of methodological flaws in the
evaluation studies. Performance in both assessment and coursework
is more likely to improve when treatment is directed to the self-
orientated cognitive reactions. Available research thus far suggests
that the best results were obtained from combined treatment
programmes, which included the elimination of emotionality and
concern, as well as the improvement of study skills (Hagtvet &
Johnsen, 1992). Assessment anxiety is a complex phenomenon with
multiple causes, and the relative contribution of different causes varies
with the individual. To be effective, treatment programmes should be
adapted to individual needs.
While assessment contexts may be perceived to be stressful in
general, for a child or adult suffering from asthma or epilepsy, it is
possible that a perceived stressful situation could elicit an asthma or
epileptic attack. It is thus important that the assessment practitioner
establishes a relationship with the test-taker prior to the assessment
so as to make the assessment situation as unstressful as possible.
However, the practitioner should be alert to possible signs of stress
and/or of an asthma/epileptic attack throughout the session.
Practitioners should also receive the necessary first-aid training to
know how to respond appropriately when someone has an asthma or
epileptic attack.

❱❱ CRITICAL THINKING CHALLENGE 9.3

Suppose you must ask a four-year-old child to copy a circle. Explain how
you would obtain the cooperation of, and establish rapport with, the child.

9.3.3 Providing assessment instructions


Assessment instructions, which have been carefully rehearsed
beforehand, should be read slowly and clearly. Providing instructions
is the most important task the assessment practitioner has to
perform in the whole process of assessment. The purpose of a
standardised assessment measure is to obtain a measurement that
may be compared with measurements made at other times. It is
imperative that the assessment practitioner gives the directions in
precisely the way that they are presented in the manual. The
assessment practitioner’s task is to follow the printed directions,
reading them word for word, without adding or changing anything. It is
of the utmost importance that there should be no deviation from the
instructions, so that each test-taker will have exactly the same
opportunity to understand what is to be done, and to work according to
the same instructions. Only if assessment measures are administered
strictly according to the prescribed instructions can test-takers’ scores
be compared.
Whether in an individual context or a group context, the
assessment practitioner should position him or herself in such a way
that interaction with test-takers is facilitated. Often the manual
indicates how the assessment practitioner should be positioned
relative to the test-taker. But when an African-centred approach to
assessment is adopted, practitioners should be sensitive to cultural
norms related to seating and eye contact. For example, Moletsane
(2004) administered the Rorschach Inkblot Method (RIM) test to a
group of South African test-takers who had not been exposed to
Western-oriented psychological tests. The preferred seating
arrangement for the RIM is that the test-taker and assessment
practitioner sit side by side, which could be appropriate in cultural
contexts in Africa in which ongoing eye contact with an elder or
authority figure is regarded as disrespectful. However, Moletsane
(2004) found that 30 per cent of the test-takers were uncomfortable
with this seating arrangement. Consequently, when the test-takers
were retested following an adjusted procedure, test-takers were given
the choice of where they preferred to sit relative to the assessment
practitioner − side by side, opposite each other, or both near each
other at the corners of the table. Interestingly, Moletsane (2004) found
that 60 per cent of the test-takers chose to sit opposite the
assessment practitioner. What this study highlights is the importance
of being flexible in terms of the seating arrangement that makes the
test-taker feel most comfortable and at ease, and to not presume
which seating arrangements might be more appropriate for certain
cultural groups.
After each instruction, the assessment practitioner should check to
ensure that the test-taker understands the instructions and has not
misinterpreted any aspect. Although test-takers should always be
given the opportunity to ask questions, the experienced assessment
practitioner always tries to avoid unnecessary questions by giving the
instructions as fully and clearly as possible. Questions should always
be answered on the basis of the guidelines provided in the manual.
9.3.4 Adhering to time limits
Assessment practitioners should always adhere strictly to the
stipulated assessment times. Any deviation from these times will
render the norms invalid. When assessment measures that have
time limits are administered, a stopwatch should be used in order to
comply with time limits strictly.
When assessment measures are administered to large groups, it is
advisable to tell the test-takers to put down their pencils when the
command to stop is given and to sit back in their seats. In this way, it
will be easier for the assessment practitioner to ensure that there is
close adherence to the time limit.
Where the instructions stipulate that test-takers should be given a
break, the length of the breaks should also be strictly adhered to.
These breaks are an integral part of the standardisation of the
assessment measure and should not be omitted, even when an
assessment practitioner notices that time is running out in relation to
the period of time set aside for the assessment session. If sufficient
time is not available, the assessment measure should rather be
administered on a subsequent day.

9.3.5 Managing irregularities


The assessment practitioner should always be alert to any
irregularities and deviations from standardised procedures that
might arise. The assessment practitioner should be aware of signs
related to low motivation, distractibility, and stress in the test-taker.
Close attention should be paid to signs of excessive nervousness,
illness, restlessness, and indifference, and the assessment
practitioner may have to intervene if it appears that these factors are
resulting in the test-taker not being able to perform optimally. If the
assessment practitioner knows that a test-taker has a chronic
condition, and how this condition could impact on the test-taker’s
concentration and motivation levels, plans can be put in place to
combat this in order to obtain maximum participation. For example, a
diabetic child may need to eat during the assessment.
The assessment practitioner should also be alert to any strange
happenings in the assessment environment (e.g. a phone ringing, loud
noises, noisy laughter). He or she should keep a careful record of
factors related to the test-taker as well as environmental and
situational factors that could impact on test performance, and should
take these factors into account when interpreting the results.

9.3.6 Recording assessment behaviour


Assessment practitioners should keep a close record of a test-
taker’s behaviour during an assessment session. Which tasks
seemed to cause the most anxiety? Which tasks seemed to be the
easiest? How well was the test-taker able to attend to the assessment
tasks? Was it easy or difficult to establish rapport with the test-taker?
How effectively was the test-taker able to communicate the answers to
the assessment questions? How did the test-taker solve complex
problems? To answer this question, the assessment practitioner will
have to ask the test-taker how he or she set about solving a problem
presented during the assessment session. All the information gained
by observing assessment behaviour should be used to amplify and
interpret the assessment findings.
Figure 9.3 Assessment administration

Figure 9.3 provides a summary of the assessment practitioner’s


duties during an assessment session.

❱❱ CRITICAL THINKING CHALLENGE 9.4

What, in your opinion, should be done in the training of assessment


practitioners to enable them to develop the necessary knowledge and
sensitivity to work in an ethical way in multicultural and multilingual contexts?
Sources: Foxcroft, C.D. 2011a. Ethical issues related to psychological testing in
Africa: What I have learned (so far). Online Readings in Psychology and Culture,
2(2). Retrieved from http://scholarworks.gvsu.edu/orpc/vol2/iss2/7.
https://doi.org/10.9707/2307-0919.1022
Moletsane, M., & Eloff, I. (2006). The adjustment of Rorschach
Comprehensive System procedures for South African learners.
South African Journal of Psychology, 36, pp. 880−902.

9.3.7 Specific suggestions for assessing young children


and individuals with physical and mental disabilities
While a general overview of the assessment practitioner’s duties with
respect to the administration of assessment measures has been
provided above, the assessment of young children and individuals
with physical and mental disabilities poses challenges for the
assessment practitioner. Some suggestions are provided below to
assist practitioners who assess young children and differently-abled
individuals.
9.3.7.1 Specific suggestions for assessing young children
Your skills as an assessment practitioner are tested to the limit when
you find yourself trying to assess a hyperactive three-year-old. What
are some of the special skills that an assessment practitioner needs to
effectively assess a young child?
The assessment practitioner has to literally get down to the child’s
level. By bending or kneeling you can look the child in the eye. In this
way you will immediately get the child’s attention and you will send a
strong non-verbal message to the child about his or her importance.
Do not begin the assessment immediately. Spend some time getting
to know the child first. This can also serve the purpose of allowing the
child to become familiar with the assessment environment. The more
comfortable and relaxed the child feels, the more likely it is that he or
she will cooperate and give of his or her best.
Introduce the assessment to the child as a series of games to be
played. This usually sounds attractive to children and helps them to
understand what might happen in an assessment session. The
assessment practitioner needs to stimulate a desire on the child’s part
to give his or her best effort. A friendly approach is important, and the
assessment practitioner needs to show flexibility and adaptability
when encouraging a child to demonstrate his or her talents, even
when they are quite limited.
Young children require a specific and clear structure in an assessment
situation. The child needs to know exactly what is expected of him or
her. The assessment practitioner needs to maintain eye contact with
the child and ensure that he or she is listening to the test instructions.
It is important that the assessment practitioner establishes exactly
what is expected before the test task is performed. To ensure that the
child gives his or her best possible response, the assessment
practitioner should not hesitate to repeat the instructions or amplify
them where necessary.
Children often say ‘I don’t know how to do that’, especially regarding
test tasks that they find difficult. The assessment practitioner should
make every effort to elicit a positive effort from the child.
Children often want to erase their writing or drawings. In such an
instance they delete very important diagnostic information. It is
suggested that children should not be permitted to use erasers.
Instead, give them another sheet of paper if they want to redraw
something.
When an assessment practitioner notices that a child’s attention is
beginning to wander, a direct verbal challenge, such as ‘I want you to
listen carefully to me’, might be useful. Frequent breaks and short
assessment sessions make it easier for children to sustain their
attention.
9.3.7.2 Assessment of individuals with physical disabilities
Whatever their age, the assessment of individuals with physical
disabilities (e.g. motor, visual, and hearing disabilities) poses special
challenges for the assessment practitioner as regards administering
the measures and interpreting performance in an unbiased way.
Simple guidelines that can be followed to ensure that measures are
administered fairly to disabled people do not seem to exist. Essentially
the assessment needs to be adapted and tailored by modifying the
test items, content, stimuli, material, or apparatus, adjusting or
abandoning time limits, and choosing the most appropriate medium
(e.g. visual, auditory, or tactile) to use. The assessment practitioner
has to make decisions regarding how to adapt or modify a measure
for use with disabled individuals and should consult the manual to see
whether any information has been provided in this regard. National
disability organisations, as well as assessment practitioners who
specialise in assessing individuals with disabilities, can also be
consulted.
Given the lack of empirical information about how disabled people
perform on assessment measures, whether or not they have been
modified, it is often more ethical and fair to use the assessment results
in a qualitative way. Assessment results can be used to gain an
indication of a particular characteristic being assessed. This
information can then be augmented by collateral information and
observations from, and interviews with, teachers and caregivers who
deal with the individual every day. Ideally, the assessment of disabled
individuals should take place within a multidisciplinary team context
involving various professionals such as paediatricians, medical
specialists, ophthalmologists, psychologists, occupational therapists,
educationalists, and so on.
While individual assessment can be tailored in relation to the
capabilities of the individual being assessed, group assessment,
which is often used in industry and educational settings, is more
difficult to tailor. It is very difficult to modify the mode of administration
for disabled individuals when they form part of a group of people being
assessed. Furthermore, if the time limit is lengthened for disabled
people, they become conscious that they are being treated differently
and those without a disability may feel that disabled people are being
given an unfair advantage.
There are thus unresolved psychometric and ethical issues
regarding the assessment of individuals with disabilities. To an extent,
some of the issues may never be resolved, as each individual
presents a unique combination of capabilities, disabilities, and
personal characteristics (Anastasi & Urbina, 1997). Nonetheless, there
is a growing body of information regarding how to assess disabled
people (Geisinger & McCormick, 2016). Furthermore, technological
advances, such as voice synthesisers and computer-controlled
electronic devices, present interesting possibilities for the innovative
assessment of individuals with disabilities (Anastasi & Urbina, 1997).
9.3.7.3 Assessment of mental disability
The intellectual level used to demarcate mental disability is an IQ
score of 70 to 75. Why an IQ score of 70 to 75? Well, think back to the
discussion of test norms in Chapter 3 and the fact that you were made
aware in the same chapter that the mean score on an intelligence test
is 100 with a standard deviation of 15. An IQ score of 70 to 75
represents approximately two standard deviations below the mean,
which is well below (i.e. sub) average.
The purpose of assessing mentally challenged people is to be able
to design and place them in appropriate training programmes that
could optimise their potential. If information can be gathered from a
variety of professionals (e.g. occupational therapists) and teachers,
the assessment data will provide a more comprehensive picture of the
individual and what he or she is capable of in his or her everyday life.
If an individual is attending a special school or training centre, teacher
or therapist observations need to form part of the assessment. An
occupational therapy assessment with a view to further placement
after school is vital. Job evaluation is an integral part of a mentally
challenged individual’s assessment.
The use of information from different people dealing with the child
is necessary as parents are not always objective when providing
information regarding their mentally disabled child. When interviewing
parents, the assessment practitioner should ask what the child can do
rather than a direct question such as ‘Can he feed himself?’ This type
of question can give the assessment practitioner inaccurate
information. Parents also often blame the child’s inabilities during
assessment to lack of sleep, to being hungry, or in an unfamiliar
environment. It is thus important to make sure that there are not too
many people around or that the child is not frightened or sleepy. Home
assessment, if possible, can cancel out some of the variables used as
excuses for the child’s poor performance.
When assessing mentally challenged individuals, it is important to
choose a measuring instrument that fits the mental age range of that
person. The diagnostic and screening developmental measures and
individual intelligence measures discussed in Chapters 10 and 15 can
all be used to determine whether intellectual functioning falls in the
mentally disabled range. Measures of adaptive behaviour in everyday
situations (e.g. the Vineland Adaptive Behaviour Scales) are also
typically included in an assessment battery for mentally challenged
persons. It is important to start the assessment with items that will
elicit success. Concentration is usually problematic and assessment
needs to be scheduled over a few sessions.

ADVANCED READING

Geisinger, K. & McCormick, C. 2016. Testing individuals with


disabilities. In F.T.L. Leong, D. Bartram, F.M. Cheung, K.F. Geisinger
& D. Iliescu, (Eds.), The ITC International handbook of testing and
assessment (pp. 259-275). New York, NY: Oxford University Press.
Joyce-Beaulieu, D. & Calleung, C.M. 2016. Children and adolescents.
In F.T.L. Leong, D. Bartram, F.M. Cheung, K.F. Geisinger & D.
Iliescu, (Eds.), The ITC international handbook of testing and
assessment (pp. 276-289). New York, NY: Oxford University Press.

The assessment practitioner’s


9.4
duties after assessment administration
9.4.1 Collecting and securing assessment materials
After administering assessment measures, the assessment
practitioner should collect and secure all materials. The booklets and
answer sheets must be counted and collated, and all other collected
materials checked to make certain that nothing is missing.
The safekeeping of assessment measures and results is closely
related to the confidential nature of the assessment process itself. In
Chapter 8, the reasons for control over psychological measures were
provided. Psychological assessment measures are confidential and
potentially harmful if they are used by unqualified people. They should
thus be safely locked away (e.g. in a strong cabinet or safe) when not
in use. The keys of such a cabinet should be available only to
authorised assessment practitioners.

ADVANCED READING

Foster, D.F. 2010. Worldwide testing and test security issues: Ethical
challenges and solutions. Ethics & Behavior, 20(3–4), pp. 207–228.

CHECK YOUR PROGRESS 9.1


9.1 When administering tests, what actions on the part of the assessment
practitioner could tend to make the test-takers’ scores less valid?
9.2 How does the role of the assessment practitioner giving an
individual test differ from that of a medical technician collecting
a blood sample for analysis?
9.3 What principles should guide the person who, for purposes of legal
procedures, is assigned to administer a test to an uncooperative adult?
9.4 In assessing a group of potential applicants who have indicated that they
are all fluent in English, the assessment practitioner finds that an applicant
is having great difficulty following the English instructions. The applicant
asks many questions, requests repetitions, and seems unable to
comprehend what is desired. What should the assessment practitioner do?
9.5 In preparing for an assessment session where a diverse group of
applicants have to be tested as part of an organisation’s selection
procedure, you find that some of the applicants have never had
exposure to groupstandardised testing whereas some of them have had
considerable exposure to tests. How would you manage this situation?
9.6 Explain to a trade union leader whether, as is mandated by the Employment
Equity Act, assessment measures can be used in a fair, nondiscriminatory way
with job applicants who have a physical or mental disability.
9.7 What are some of the important things to keep in mind when
you are
assessing young children?
9.8 What are some of the important things to keep in mind when
assessing people with physical or mental disabilities?

9.4.2 Recording process notes, scoring, and


interpreting the assessment measures
Having administered the measures, the assessment practitioner
should write up the process notes immediately, or as soon as
possible after the session. The process notes should contain the date
on which the assessment took place, which measures were
administered, as well as any important observations made about the
behaviour of the test-taker during the session and any other important
information that will assist in the interpretation of the findings.
The assessment measures also need to be scored, norm tables
need to be consulted (where appropriate), and the findings need to be
interpreted. These aspects will be covered in greater detail in Chapter
16.
Figure 9.4 provides a summary of the assessment practitioner’s
duties after the assessment session.

❱❱ CRITICAL THINKING CHALLENGE 9.5

Page back to Chapter 2 (Section 2.4.4.3), where the question of who may use
psychological tests was raised in relation to a section in a judgment that stated that it
is ‘unlikely that the primarily mechanical function of the recording of test results
should be reserved for psychologists’ (Association of Test Publishers of South Africa
and Another, Saville and Holdsworth South Africa v. The Chairperson of the
Professional Board of Psychology, 2010, p. 20). Now that you have worked through
this chapter and have an idea of what administering a psychological test entails, what
is your view on whether psychological test administration is ‘primarily a mechanical
function’, which by implication can be performed by a non-psychology professional?
Do you agree or disagree with the judge, and why?
ADVANCED READING

If you want to know more about test administration visit the following Web site:

www.intestcom.org

Figure 9.4 The assessment practitioner’s duties after assessment

9.5 Conclusion
In this chapter we introduced you to the importance of administering
psychological measures in a standardised way. Furthermore, we
outlined the responsibilities of the practitioner before, during, and after
assessment, and we highlighted ethical practices and dilemmas.
We have travelled a little way through the Assessment Practice
Zone now. You should have a broad understanding of what fair
assessment practices are, as well as some of the specific practices
that need to be adhered to when administering measures. Before we
journey further into assessment practice territory, we first need to gain
some insight into the different types of measures that are available.
So, we will be crossing into the Types of Measures Zone for a while.
Once we have visited the different types of measures, we will return to
the Assessment Practice Zone and consider issues related to
interpreting measures, reporting, factors that affect assessment
outcomes, as well as look ahead to the future of assessment.
Types of Measures Zone
In this zone you will be introduced to a variety of measures that can
be used to assess psychological constructs. You will be making
many stops along the way. Look out for the following signposts:

10 Assessment of cognitive functioning


Part 11 Measures of affective behaviour, adjustment, and well-being
12 Personality assessment
3 13 Career counselling assessment
14 Computer-based and Internet-delivered assessment
Assessment of cognitive functioning
RENÉ VAN EEDEN AND MARIÉ DE BEER
Chapter CHAPTER OUTCOMES
10 After studying this chapter you should be able to:
› explain the meaning of cognitive test scores
› discuss issues related to the use of cognitive measures
› define the different types of cognitive measures
› describe each type of measure and indicate where and how it is used
› give examples of cross-cultural cognitive research.

10.1 Introduction
What do we mean by cognitive functioning, how do we go about
measuring it, and how should we interpret the scores? These are
some of the key issues in the somewhat controversial field of cognitive
assessment discussed in this chapter. Individual differences in general
cognitive ability has been the subject of much scientific interest and
‘individual variation in general cognitive ability (GCA) matters greatly
for many life outcomes, including educational and occupational
success’ (Yeo, Ryman, Pommy, Thoma & Jung, 2016, p. 93).
The measurement of cognitive functioning has been an integral
part of the development of psychology as a science. Although
measurement has a much longer history, the main body of cognitive
assessment and measurement procedures as we know it today
developed since the beginning of the last century. Some argue that
the lack of theoretical consensus on intelligence and, consequently,
also on the measurement of cognitive ability, and in particular the
cross-cultural measurement of cognitive ability, endangers the status
of this field as a science (Helms-Lorenz, Van de Vijver & Poortinga,
2003). Also, the fact that different measures provide different scores
has often been used as evidence in attacks on cognitive measures in
particular. However, the fact that, even in the physical sciences, there
are still differences of opinion after centuries of research (Eysenck,
1988) should guide and inspire the theory and practice of cognitive
assessment. In other words, much can be achieved that is useful,
even if the ideal situation has not yet been reached. This offers the
advantage that it stimulates further research and new developments.
One of the reasons why the measurement of cognitive ability has
remained on the forefront of psychological assessment is that it is a
clear and significant predictor of success in the work and educational
contexts. Research results from meta analyses – which show
stronger evidence of trends in research results because they are
based on the collation of large numbers of similar studies – have
indicated that cognitive assessment results are generally valid
predictors of job performance and training results with predictive
validity correlations in the region of .5 to .6, thus indicating both
statistical and practical significance (Bertua, Anderson & Salgado,
2005). In a 100-year meta-analytic review of research on selection
methods, Schmidt, Oh, and Shaffer (2016) confirmed that general
cognitive ability remains the best predictor of job performance, and
job-related learning and training results.
This chapter provides a brief overview of the theoretical
underpinnings of intelligence. This serves as an important backdrop to
understand the assessment of cognitive functioning and the measures
used to do so, which is the main focus of this chapter.

10.2Theories of intelligence: A
brief history and overview
10.2.1 Background
A clear cyclical or pendulum movement can be traced throughout the
history of cognitive ability assessment. Initially, the focus was mainly
on psychophysical measures with the emphasis on objective methods
of measuring simple sensory processes, with the work of Galton and
Wundt’s psychological laboratory in 1879 as a good example of this
era in testing (Gregory, 2010). Hoffman and Deffenbacher (1992)
showed how applied cognitive psychology uses methods and theories
of experimental psychology to gain understanding of and solve real-
world problems regarding cognitive functioning. This field of applied
psychology evolves from the earliest experimental research in
psychology from the likes of Binet, Cattell, Munsterberg, and Wundt,
amongst others.
The French researchers Alfred Binet and Theodore Simon used
more complex tasks of reasoning and thinking to measure intelligence
when they were commissioned in 1904 to develop techniques for
identifying children in need of special education (Locurto, 1991) (as
was pointed out in Chapter 2). According to Gould (1981), Binet
wanted the test scores to be used practically and not as the basis for a
theory of intellect. He wanted test scores to be used to identify
children who needed special help and not as a device for ranking
children according to ability. Lastly, he hoped to emphasise that
intellectual ability can be improved and modified through special
training. However, this was not the way in which the Binet-Simon
measure was used in practice. Recently there has been a renewed
focus on this more applied approach to cognitive assessment in what
is referred to as dynamic assessment (De Beer, 2013).
Another modern method that reflects the Binet-Simon approach to
assessment is computerised adaptive testing (CAT). As we will
point out in Chapter 14, in adaptive assessment the difficulty level of
items is calculated beforehand and the items are then interactively
selected during assessment to match the estimated ability level of the
test-taker. A South African test that utilises this technology is the
Learning Potential Computerised Adaptive Test (LPCAT) (De Beer,
2005, 2010). The modern version of this approach is based on Item
Response Theory (IRT) and the use of interactive computer
programmes. However, in principle, Binet and Simon applied these
same principles and used the assessment practitioner’s judgement to
decide which entry-level item the child should get first. They made use
of a ceiling level by terminating the measure if a certain number of
items at a specific level were answered incorrectly (Reckase, 1988).
During World War I and II, important contributions were made by
psychologists with regard to assessment for selection and placement
decisions (Hoffman & Deffenbacher, 1992). The influence of
computers and information processing approaches were seen in the
1960s. However, some researchers warned that social, cultural,
motivational, and emotional factors should not be neglected when
attempting to fully understand cognition.
Some exciting new developments in the use of computerised
adaptive testing include what is referred to as cognitive diagnostic
assessment (CDA) (Gierl & Zhou, 2008; Xu, Wang & Shang, 2016).
This involves promising applications in the educational domain, where
diagnostic assessment can be linked to remedial instruction in areas
where lower performance levels are indicated (Gierl & Zhou, 2008).
Multidimensional computerised adaptive testing (MCAT) has been
shown to offer shorter tests, improve test precision, and increase
selection utility compared to traditional testing (Makransky & Glas,
2013). Other new developments are discussed in Section 10.7.
Another recurring theme is the controversy between Spearman’s
and Thurstone’s theories, namely the question of whether intelligence
is influenced by a single general factor or by multiple factors. Both
theories are still actively used in current research today (Carroll, 1997;
Helms-Lorenz et al., 2003; Johnson, Bouchard, Krueger, McGue &
Gottesman, 2004; Major, Johnson & Deary, 2012). The measurement
of emotional intelligence (EQ), which represents the area in which
cognitive and personality assessment are combined, has also been
receiving a lot of attention in the last two decades (Boyle, Humphrey,
Pollack, Hawver & Story, 2011; Matthews, Zeidner & Roberts, 2002).
On the social front, there is still active and ongoing debate on
cognitive ability assessment, the interpretation of test scores, and
particularly the differences between groups. Every now and again, a
new publication appears and, once again, controversy flares up. One
example is Herrnstein and Murray’s book The Bell Curve, published in
1994. Some earlier books that have had a similar effect are Gould’s
The Mismeasure of Man (1981), Fancher’s The Intelligence Men
(1985), and Eysenck and Kamin’s The Intelligence Controversy
(1981).
Here in South Africa, our country is adapting to the integration of
different cultures and guidelines on equity and affirmative action have
been introduced in our legislation. In practice, this legislation also
affects the use of psychological assessment. We may well see a
repeat of the American pattern, that is, a movement away from and
subsequent return to assessment, as well as possible legal cases
about the fairness of measures for various groups. One of the ongoing
debates is the appropriateness of norms for various subgroups –
particularly in multicultural contexts such as South Africa
(Shuttleworth-Edwards, 2017).

10.2.2 Defining intelligence


Why has the study of intelligence, or the measurement of cognitive
functioning, involved so many problems and controversies? We need
to start by looking at the theoretical underpinning of the construct. It
may surprise you to learn that, even today, psychologists do not agree
on how to define cognitive functioning or intelligence, how to explain
exactly the way in which it functions, or how it should be measured.
This complicates efforts to understand the concept of cognitive
functioning and also makes it difficult to build theories or construct
procedures and methods to measure intelligence. Interestingly
enough, there are parallels in physics, where people still manage to
live with such unclear issues. For instance, in the case of gravity or
heat, there are no unilaterally accepted definitions, theories, or
measuring devices. However, this has not caused the controversy that
we find with the measurement of intelligence (Eysenck, 1988).
In general, we can distinguish different types of intelligence, for
instance, biological intelligence, psychometric intelligence, and social
(or emotional) intelligence. When we discuss biological intelligence,
we focus on the physical structure and functioning of the brain in ways
that can be measured objectively. For example, we measure reaction
times to physical stimuli, although using much more sophisticated
methods than those used in the early measurement laboratories.
Psychometric intelligence implies that we use mainly standardised
psychological tests to measure levels of functioning on psychologically
defined constructs. According to the psychometric view, intelligence is
defined as ‘what intelligence tests measure’, which seems to be a
confusing way to define a construct. These measures make use of
various types of items and the assumption is that the ability to display
the type of behaviour needed to correctly answer these items reflects
some form of intelligent behaviour. Some exciting new developments
include the use of African art and artefacts as inspiration for cognitive
ability assessment (see Figure 10.1) on the premise that the material
is more familiar to the majority of test-takers (Bekwa, 2016). It is
always important to remember that all these measures base their final
scores on a limited sample of behaviour of the people being assessed.
The social or contextual (emotional) view of intelligence defines the
construct of intelligence in terms of adaptive behaviour and argues
that we must define intelligent behaviour within the particular context
where we find it. Applied cognitive psychology is an area of research
that endeavours to gain scientific understanding of cognitive
functioning and the ability to solve related practical problems (Hoffman
& Deffenbacher, 1992).
The value of tests based on these theories and models has been
recognised. Based on the assumption that the underlying constructs
measured by these tests are universal, tests have been adapted and
standardised for use in multicultural contexts (an etic approach).
Greater provision should, however, be made for an indigenous
conceptualisation of the nature and structure of intelligence (an emic
approach). Foxcroft (2011a) refers specifically to the consideration of
behavioural criteria relevant to and valued by a specific culture and in
a specific context.
Cultural environments determine which abilities are preferred and
promoted as meaningful (Mpofu, Ntinda & Oakland, 2012), which
implies that understanding and respecting the context within which
specific abilities are valued is vital. These authors refer to a broader
understanding of human abilities in the sub-Saharan African context
that reflects the value of social interests and benefits to the collective.
Related research indicated the value of abilities that are socially
significant and that benefit the group (Furnham, Ndlovu & Mkhize,
2009; Mpofu et al., 2012). In addition to intra- and interpersonal
domains, cognitive abilities and education were valued as means to
participate in modern economies. The need for abilities to be
functional was also reflected in the identification of the ability to
manage resources and to contribute productively.
These findings provide support for Sternberg’s (1984, 2012) notion
of contextual intelligence that proposes that environmental demands
and priorities determine which abilities are valued. Furthermore,
Wechsler’s (1944, p. 3) definition of intelligence as the capacity of the
individual to ‘act purposefully, to think rationally, and to deal effectively
with his [sic] environment’ retains its relevance also within different
cultural contexts.

10.2.3 Theories of intelligence


Many different theories of intelligence have developed over the
decades, together with many different approaches to the
measurement of intelligence. Although each of these theories
contributes to our general understanding of intelligence in its own way,
each has both supporters and critics – this highlights the controversial
nature of dealing with, and trying to understand and explain, human
cognitive functioning. Plomin and Petrill (1997) summarised the
essence of this dilemma when they emphasised that, although all men
are created equal, as the saying goes, this does not necessarily mean
that all men are created the same. We have to be aware that the
framework or perspective that we use to view people will affect the
way in which we interpret our observations and measurement results.
One general factor (g)
Spearman was the first person to suggest that a single general
factor (g) could be used to explain differences between individuals.
This view is based on the fact that different measures of cognitive
ability correlate positively with each other, indicating that they
measure some shared ability or construct. Even when multiple factors
are identified, second-order factor analysis usually indicates some
underlying general factor. On the other hand, factors specific to a
particular activity can also be identified and are known as specific
factors (s). As a result, we have the well-known two-factor theory of
intelligence that allows for both a general factor (g) and specific
factors (s) (Gregory, 2010). Cattell later maintained that Spearman’s g
could be split into two distinct g’s, which he called gf or fluid
intelligence and gc or crystallised intelligence (Gregory, 2010; Owen,
1998). Carroll (1997) reiterated the general agreement among experts
that some general factor exists and maintained that there is ample
evidence to support this view (Helms-Lorenz et al., 2003; Johnson et
al., 2004). Despite the contentions about the structure and nature of
intelligence, the major theories of intelligence that were developed in
the early twentieth century are still considered in more recent research
in the field of cognitive ability assessment (Major et al., 2012). Results
of a major study based on a very large-scale assessment study of
results from 1960 confirmed a main general ability dimension, together
with subdimensions of verbal ability, perceptual abilities, and images
rotation (Major et al., 2012).
Multiple factors
Thurstone was the main proponent of the multiple-factor theory. He
identified seven primary mental abilities, namely verbal
comprehension, general reasoning, word fluency, memory, number,
spatial, and perceptual speed abilities (Anastasi & Urbina, 1997).
Although there was lively debate between supporters of the one-factor
theory of intelligence and supporters of the multiple-factor theory,
Eysenck (1988) pointed out that the proponents of these two opposing
theories of intelligence were eventually forced to agree on a similar
view of the structure of intellect.
Biological measures (reaction time and evoked potential)
Different physical and biological measures are often correlated with
intelligence scores, and strong evidence exists that a significant
correlation exists between these measures and performance on
standardised intelligence measures. According to Vernon (1990),
there is little doubt that speed of information processing forms an
integral part of general intelligence. Because these measures do not
rely on past learning, they can be administered to persons in a very
wide range of age and level of mental abilities. The biological
approach was one of the first recorded approaches, and Wundt’s
laboratory, where a variety of physical measurements was recorded,
served as the starting point.
Multiple intelligences
Gardner (1983) identified several mental skills, talents, or abilities
making up what he defined as intelligence. These are musical, bodily
kinaesthetic, logical-mathematical, linguistic, spatial, interpersonal,
and intrapersonal skills. This approach broadened the view of
intelligence by including a variety of different facets that are
considered to be part of intelligence, however, practical considerations
limit the application of this view.
Cattell-Horn-Carroll model
The Cattell-Horn-Carroll (CHC) cognitive ability taxonomy has
received increasing support. It identifies three levels with g (general
intelligence) at the highest level (Cormier, Bulut, McGrew & Singh,
2017). The other two levels refer to broad and narrow abilities
respectively. The CHC model is grounded in empirical evidence and
has been accepted by psychologists (James, Jacobs & Roodenburg,
2015) with implications for intervention from a theoretically grounded
framework.
Stages of cognitive development
Piaget, a Swiss psychologist, developed a theory that is based on four
different stages of cognitive development that can be identified
(Gregory, 2010). The four stages are the sensorimotor (birth to two
years); pre-operational (2 to 6 years); concrete operational (7 to 12
years); and formal operational (12 years and older). According to
Piaget, each of the four stages are qualitatively different from the
others and characterised by distinctive patterns of thought (Gregory,
2007).
Contextual intelligence
As mentioned in 10.2.2 Sternberg (1984) proposed that intelligence
should be seen in terms of the contexts in which it occurs rather than
seeing it only as something we obtain from test results (Brody, 2003a,
2003b; Sternberg, 2003, 2012). Socio-cultural factors and context
should also be taken into consideration. The individual’s ability to
adapt to real-world environments is important and Sternberg therefore
emphasised real-world behaviour, relevance of behaviour, adaptation,
shaping of environments as well as adapting to them, and purposeful
or goal-directed behaviour. He proposed a triarchic theory of
intelligence, which includes Componential (analytical) intelligence,
Experiential (creative) intelligence and Contextual (practical)
intelligence (Gregory, 2010; Sternberg, 2012). The Sternberg Triarchic
Abilities Test (STAT) is based on this theory (Brody, 2003a, 2003b;
Gottfredson, 2003; Sternberg, 2003). Let us take an example from
nature and look at the queen bee: although she has a limited range of
activities, she nevertheless plays a crucial role within that context.
Although certain behaviours may fall short of what is prescribed as the
norm when they are measured objectively, these same behaviours
may very well be representative of highly adaptive behaviour.
Conceptual intelligence and the systems or information-processing
approach
What was initially known as the information-processing approach has
since become known as the cognitive-processing approach to the
measurement of cognitive ability (Naglieri, 1997). According to this
approach, intelligence is seen as based on three components, namely
attentional processes, information processes, and planning processes.
The PASS (Planning, Attention, Simultaneous, and Successive)
theory is an example of this approach (Das, Naglieri & Kirby, 1994).
Dynamic assessment / learning potential assessment
Dynamic assessment is a specific approach to assessment aimed at
measuring learning potential. It incorporates training into the
assessment process in an attempt to evaluate not only the current
level of cognitive ability and performance, but also the potential future
level of performance. It is based on Vygotsky’s theory (Vygotsky,
1978), which distinguishes between the level of functioning a person
can reach without help and the level of functioning a person can reach
with help. Therefore, Vygotsky’s theory incorporates the view that lack
of educational or socio-economic opportunities affects cognitive
functioning and may prevent some people from reaching their full
potential. It makes provision for the effect of differences in educational
and social opportunities and focuses on the measurement of learning
potential. Grigorenko and Sternberg (1998) gave an extensive
overview of the work that has been done in this field. Many different
approaches exist, some of which attempt to change the level of
functioning (Feuerstein, Rand, Jensen, Kaniel & Tzuriel, 1987), while
others attempt in various ways to modify test performance only and
monitor the help that is required to reach certain predetermined levels
of functioning (Budoff, 1987; Campione & Brown, 1987). In South
Africa, Taylor (1994) has developed a family of measures (APIL-B,
TRAM1, TRAM2) based on his approach to measurement of learning
potential, and De Beer (2000a, 2000b, 2006, 2010) developed the
LPCAT, a dynamic, computerised adaptive measure for the
measurement of learning potential. These tests provide not only
information on the current level of performance achieved by the
individual, but by incorporating a learning experience as part of the
assessment, are also able to provide information on the projected
future (potential) levels of achievement that could be attained by the
individual if relevant learning opportunities can be provided. By
providing a learning opportunity as part of the assessment, it takes
into account that individuals can differ considerably in terms of their
educational and socio-economic background – factors that are known
to influence cognitive performance – and attempts to provide an
opportunity for individuals to indicate their potential levels of
performance irrespective of their background. Dynamic assessment
(DA) has shown improved predictive validity compared to standard
tests (Caffrey, Fuchs & Fuchs, 2008; Davidson, Johannesen &
Fiszdon, 2016). Three concerns that are often noted about DA are ‘its
construct is fuzzy, its technical characteristics are largely unknown,
and its administration and scoring are labor intensive’ (Caffrey et al.,
2008, p. 255). De Beer (2010) reported how IRT and CAT address
these concerns.
While one of the criticisms against DA has been that there was
limited empirical research results on the psychometric properties of
such measures (Grigorenko & Sternberg, 1998), a review of 24
studies by Caffrey, Fuchs, and Fuchs (2008) has shown the superior
predictive validity of DA for criterion-referenced tests with average
correlations of .49 between DA and achievement, compared to
average correlations of .41 between traditional cognitive testing and
achievement. A distinction can be made between clinically oriented
DA and researchor psychometrically-oriented DA (Caffrey et al., 2008;
De Beer, 2010). ‘Several studies have reported that DA can contribute
to the prediction of achievement beyond traditional tests’ (Caffrey et
al., 2008, p. 263).

ADVANCED READING: THE INFORMATION-PROCESSING PARADIGM

The Cognitive Process Profile (CPP) was developed to measure cognitive


preferences and capabilities. It does so by externalising and tracking thinking
processes according to thousands of measurement points to assess a person’s
cognitive styles, processing competencies, learning potential, and most suitable
working environment. The CPP is a self-administering (but supervised)
computerised simulation exercise. Scoring and interpretation take place by
means of an expert system and automatic report generator. The CPP is used
within the work environment for purposes of selection, placement, succession,
development, and team compilation. Extensive research has been conducted on
the construct validity of the self-contained theoretical model on which the CPP is
based, as well as on the validity and reliability of the constructs measured by the
CPP. The Learning Orientation Index (LOI) has been developed for the cognitive
assessment of school and university leavers. It operates on the same principles
as the CPP, but involves entirely different task requirements.

Emotional intelligence
The measurement of emotional intelligence refers to the behavioural
and interpersonal adjustment of the individual to a particular
environment or situation.
Goleman (1995) argued that our traditional way of looking at
intelligence is much too narrow and does not allow for the role that our
emotions play in thought, decision-making, and eventually also in our
success. He challenged the narrow view that intelligence is merely
‘what intelligence tests measure’ or IQ, and argued that aspects such
as self-control, zeal, and persistence, as well as the ability to motivate
oneself, are important factors determining success in life. A distinction
can be made between the socio-emotional approach to the study of
emotional intelligence (a broad model that includes abilities as well as
a series of personality traits) and the view that links emotions with
reasoning (a view that focuses on abilities only). The first approach
includes the initial work by Salovey and Mayer (1990), and the work
by Goleman (1995) and Bar-On (1997), while the latter refers to the
adaptations by Mayer and Salovey (1997). Studies involving various
measures of emotional intelligence have indicated that the construct is
related to cognitive variables such as thinking styles as well as
personality variables such as leadership effectiveness and stress
management (e.g. Herbst & Maree, 2008; Murphy & Janeke, 2009;
Ramesar, Koortzen & Oosthuizen, 2009). Examples of measures of
emotional intelligence include objective measures, such as the
Multifactor Emotional Intelligence Scales (MEIS) and the Mayer,
Salovey and Caruso Emotional Intelligence Test (MSCEIT), and self-
report measures such as the Bar-On Emotional Quotient Inventory
(EQ-I).
Emotional intelligence (EI) has been linked to ‘emotional labour’
(O’Boyle, Humphrey, Pollack, Hawver & Story, 2011), which is
required by jobs with a heavy interpersonal component such as
service work and leadership positions. EI measures have been
reported as predicting job performance equally as well as or better
than a combination of cognitive ability and personality and add unique
additional predictive power over and above that provided by cognitive
and personality (O’Boyle et al., 2011).

Interpreting the intelligence


10.3
score and diversity issues
10.3.1 The meaning of the intelligence score
Misconceptions about interpreting and using cognitive test results
have been at the core of many controversies in this field. Owen (1998)
warned against the danger of:
viewing results as if they represent an inherent and unchangeable
ability
the expectation that results are 100 per cent accurate
• the view that results are infallible and perfectly reliable.
In the preceding sections, the concept, theories, and measurement of
intelligence were discussed in the broad sense, including different
ways of defining and measuring intelligence. The use of intelligence
measures to obtain a score that reflects a person’s intellectual ability
has traditionally been associated with the measurement of
intelligence. Accepting that this is one way of defining and measuring
intelligence, we will now describe the intelligence score and explain
what is meant by the term.
An individual’s mental level or mental age corresponds to the age
of normal children or adults who perform similarly on a measure. The
term Intelligence Quotient (IQ) was introduced by the German
psychologist Stern in 1912 in an attempt to enable us to compare
scores among and between different age groups (Gregory, 2010). The
IQ, which is obtained by dividing mental age by chronological age,
represents a relationship between mental age and chronological age.
It provides a measure of an individual’s relative functioning compared
to his or her age group and makes allowance for the development of
cognitive ability with increasing age up to adulthood.
If standardised scores are used, performance on a measure can
also be compared across age groups. Based on the mean and
standard deviation of performance of a specific age group, raw scores
can be converted to standard scores or normalised scaled scores
(these concepts were discussed in more detail in Chapter 3). It is then
possible to determine how an individual’s test performance compares
with that of his or her age group. Because the same scale is used for
all age groups, the meaning of a particular score also remains
constant from one age group to the next. In the case of intelligence
tests, the intelligence score is often expressed on a scale with a mean
of 100 and a standard deviation of 15. Although this is a standard
score and not a quotient, the term Deviation IQ is used to describe it.
We can explain what intelligence scores mean by using height as
an example. We know that children grow in height from birth until
approximately 18 years of age. By looking at the height of a child, we
can to some degree guess his or her age, thereby acknowledging that
there is a mean or standard height for children of specific ages.
Although we do acknowledge that a great variety of different heights
exist within each age bracket, most children of a certain age are
similar in height. If we were to obtain the average height of each age
group, we could compare any child of a certain age with the average
height of his or her age group and then decide whether this child was
average in height, above the average (taller), or below the average
(shorter). Because the average heights of the different age groups
would not be the same, we would have difficulty comparing children of
different ages with each other. However, if we standardised the height
measurement by taking into account that the height changes for the
different age groups, we would then be able to compare the height of
children of different ages. In this way, we could compare children in
the same age group, but we could also compare children of different
age groups directly with each other with regard to their relative
standing when compared to their age group. The same is done with
intelligence test scores.
We must remember that an IQ or intelligence score is a theoretical
concept and it cannot reflect intelligence in all its diversity. The way in
which intelligence is defined will determine the type of tasks included
in a measure, which should be taken into account in the interpretation
of the score obtained. In addition, psychological measures are not as
accurate as physical measures like height, and this contributes to the
difficulty and controversy surrounding the measurement of cognitive
functioning. The inaccuracy of measurement in the social sciences is
partly due to the fact that we measure constructs, entities that are not
directly visible or directly measurable and which have to be measured
in some roundabout and indirect manner. We have to take a sample of
behaviour and on the basis of performance of particular tasks that
have been set, evaluate current performance and make predictions
about future performance in real-life activities. There are also various
factors other than cognitive ability that could affect test results. These
factors are discussed in more detail in Chapter 17.

10.3.2 Individual differences and cultural diversity


People in general, and psychologists in particular, have always been,
and will probably always be, interested in cognitive differences (and at
times similarities too) between people – this is probably part of human
nature. The fact that individuals differ from each other forms the basis
of the entire field of psychology. South Africa’s multicultural society
has had an eventful history. Its inhabitants, both individuals and
groups, are in various stages of adaptation to a generally Western
society, made up of the interesting but complicating diversity of all the
people who live here. The spectrum of differences is certainly
fascinating, but, for test developers and assessment practitioners, it
presents an unenviably complex task. Shuttleworth-Jordan (1996)
noted ‘signs of a narrowing and possibly disappearing gap across race
groups on cognitive test results in association with a reduction in
socio-cultural differences’ and takes this as indicative of similarities
between people in terms of cognitive processes (p. 97). Claassen
(1997) reported the same pattern of change and decreasing
differences in scores when comparing English-speaking and
Afrikaans-speaking whites* over the last 50 years. This indicates quite
clearly that differences in test results of various cultural groups can
often be attributed to the general level of acculturation of a group,
compared to the norm group for which the measures are constructed
and against whose results comparisons are made. Similar patterns
have been found in the US, where differences between children of
different cultural groups are markedly different from those found
between adults of the same groups. This indicates that ‘differences
are in a very real sense a barometer of educational and economic
opportunity’ (Petrill, Pike, Price & Plomin, 2004; Vincent, 1991, p.
269). Shuttleworth-Edwards (2017) warns about the difficulties of
using national norms in a multicultural context such as South Africa.
The Raven’s Advanced Progressive Matrices can be considered a
fair and less biased measure of cognitive functioning for use in
multicultural contexts (Cockcroft & Israel, 2011). Similar results have
been reported for gender and language groups in a recent study
(Cockcroft & Israel, 2011).
Cross-cultural assessment of cognitive ability (and other
psychological constructs) has received a lot of attention in recent
years. Researchers consider different aspects when investigating the
cross-cultural equivalence of measures (Helms-Lorenz et al., 2003).
The three aspects that Van de Vijver (2002) specifically referred to are
construct equivalence, method equivalence, and item equivalence
(also refer to Chapter 7 in this regard). Construct equivalence refers
to the fact that one needs to investigate and ensure that the construct
being measured is equivalent across different subgroups. One way of
doing this is to make use of factor analysis, to indicate that the
underlying pattern(s) of the construct(s) being measured is (are)
similar for various subgroups. Another way to investigate this aspect is
to compare the results of different measuring instruments that
measure the same construct – also referred to as construct validity. In
terms of method equivalence, reference is made to the way in which
measures are applied – ensuring that the contextual and practical
application of measures do not lead to differences between
subgroups. Aspects such as test sophistication, language proficiency,
and so on, are relevant here. Lastly, for item equivalence one also
needs to ensure that different subgroups do not respond to a
particular item differently due to the way that it has been constructed
and not related to the construct being measured. In this regard the use
of IRT (item response theory) and DIF (differential item
functioning) methods are powerful tools in detecting differences in
the performance of groups on particular items (De Beer, 2004).

❱❱ CRITICAL THINKING CHALLENGE 10.1


The role of cognitive assessment in South Africa
As mentioned elsewhere in this chapter, cognitive tests have a long history of use
in educational and industrial contexts for screening and selection purposes. The
reason for their inclusion is that they consistently show statistically and practically
significant relationships with work performance and educational achievement.

Discuss with fellow students or with someone knowledgeable in the field of


cognitive assessment how cognitive assessments can be used in a fair and
equitable manner in a multicultural and multilingual context such as South
Africa, considering the vast differences in educational and socio-economic
factors that exist between some individuals and groups, which could affect
cognitive test results. Include in your discussion whether African-centred
understandings of what constitutes intelligence would enhance and expand
Western-based views of intelligence and what the implications of this are
for the cognitive assessment measures used in South Africa.

Figure 10.1 Example of cognitive ability items (left) inspired by African art (right)
SOURCE: Bekwa (2016). Reproduced by permission. Photograph supplied by Kaetana,
Shutterstock.

In a multicultural and multilingual country such as South Africa,


research has to include the investigation of possible cross-cultural
differences, fairness, and bias – in line with the requirements of the
Employment Equity Act (No. 55 of 1998).
Malda, Van de Vijver, and Temane (2010) have reported that
differences in cognitive test scores were linked to content familiarity.
Other research aimed at addressing this issue involves the use of
African-inspired item content for developing items for cognitive
assessment, as illustrated in Figure 10.1 (Bekwa, 2016; Bekwa & De
Beer, 2012).
You now have some background to the way we define and think
about intelligence, how we go about measuring it, and how the scores
can be interpreted. We will now provide some information on specific
measures of cognitive ability and use examples to illustrate the
different types of measures.
You can obtain information on specific measures from local test
publishers or sales agents for international test companies.

Measures of general
10.4
cognitive functioning
10.4.1 Individual intelligence measures

Intelligence measures typically consist of a number of verbal and non-


verbal subtests that together provide an indication of general
intelligence (see Section 10.4.1.1). If a more comprehensive picture of
an individual’s functioning is required, performance on the different
subtests can also be analysed. Intelligence measures can be used in
various assessment contexts. Consider the following scenario:
A young boy has been living in a shelter for two years. A social worker has
now been able to place him in a school that is also prepared to provide
accommodation for him. However, they need to know if this child has the general
ability to benefit from mainstream education and/or if he has special needs.

This is an example of the type of situation where an individual


intelligence measure can be used to obtain an indication of a person’s
level of general intelligence, and cognitive strengths and weaknesses.
This information can then be used to recommend school placement as
well as possible interventions, should special learning needs be
identified.
Other contexts where general ability is taken into account is in
career counselling, selection processes, and if a person is not
performing well at school or in the workplace. In such situations,
individual intelligence measures can provide information to guide
further psycho-diagnostic assessment (see Chapter 15).
10.4.1.1 Description and aim
The measures commonly used in South Africa are based on the
psychometric model of testing. This model assumes that the
individual’s ability to perform a variety of tasks that require intelligence
(in accordance with a specific theoretical definition) represents his or
her level of general intelligence (Wechsler, 1958). Therefore, the items
included in these measures cover a wide field and the different tasks
are related to different cognitive abilities. For example, in one of the
types of items in the Junior South African Individual Scales (JSAIS),
the assessment practitioner reads out a series of numbers, which the
test-taker must repeat in the same order. The cognitive ability
measured by this type of item is attention and immediate auditory
recall for numbers (remembering something that one has heard rather
than something that one has read) (Madge, 1981).
Items in an intelligence measure can be classified and described
according to item type and the specific abilities measured. In a test
such as the Individual Scale for General Scholastic Aptitude (ISGSA)
a variety of items are arranged in ascending order of difficulty (i.e.
becoming more difficult) (Robinson, 1996). Together they measure
general ability. However, it is also possible to group relatively
homogeneous (similar) items into subtests. Box 10.1 contains a
description of the subtests of the Wechsler Adult Intelligence Scale –
Fourth SA Edition (WAIS-IVSA) (Maccow, 2011; Wechsler, 2014).
Scores are also provided for groups of subtests that have mental
abilities in common (i.e. factors such as verbal comprehension,
processing speed, and so on). In Box 10.1 we show how the 15
subtests of the WAIS-IVSA are grouped together to produce four index
scores, namely the Verbal Comprehension Index (VCI), the
Perceptual Reasoning Index (PRI), the Working Memory Index (WMI),
and the Processing Speed Index (PSI). In addition, the Full Scale IQ
(FSIQ) comprises the 10 core subtests thus representing the
underlying general factor of intelligence (Spearman’s g factor). The
Global Intelligence Scale of the JSAIS renders four subscales, namely
a Verbal IQ Scale, a Performance IQ Scale, a Numerical Scale and a
Memory Scale each consisting of a number of subtests (Madge,
1981). In the case of the Senior South African Individual Scale─
Revised (SSAIS-R), scores are available on a Verbal Scale, a Non-
verbal Scale and a Full Scale (Van Eeden, 1991), You should note
that advances in the understanding of the structure of intelligence
imply that in the case of the latest versions of individual intelligence
measures (e.g. the WAIS-IVSA), a Verbal IQ and a Performance IQ
are no longer provided. Interpretation is based only on the finer
distinctions as reflected in the index scores.
In Box 10.2 descriptions of the Griffiths III (Stroud, 2016) and the
Grover-Counter Scale of Cognitive Development (GCS) (HSRC, 1994)
illustrate the unique combinations of tasks used in diagnostic
measures for young children to assess the functions associated with
the developmental stages of this phase. The Griffiths Mental
Development Scales – Extended Revised (GMDS-ER) is a 2006
revision of the 1996 version (Luiz, Barnard, Knoesen, Kotras,
Horrocks, McAlinden, Challis & O’Connell, 2006). It consists of two
sets of scales, one for infants and toddlers (0 to 2 years) and a set of
scales for young children (2 to 8 years). Some South African research
has been conducted on the GMDS-ER (e.g. Van Heerden, 2007; Van
Rooyen, 2005) in which ‘normal’ South African children were
compared with their counterparts in the standardisation sample in the
UK. Van Heerden’s (2007) study raised serious questions about the
applicability of the norms developed in the UK for South African
children. Arguments have, however, been presented for and against
the local standardisation of these scales (Jacklin & Cockcroft, 2013).
The Griffiths III (the latest version that was released in 2016) can be
used from birth (1 month) to 5 years and 11 months (Stroud, 2016).
The GCS is a behavioural test specifically aimed at individuals with
extremely impaired verbal skills. It is also suitable in other situations
where a verbal test may not be suitable, for example where the
language of the administrator is not the mother tongue of the test-
taker. Normative information and support for the reliability and validity
of the test are available for African-language-speaking children. A
growing body of research is developing on the scale (e.g. Dickman,
1994), which has proved its clinical use in South Africa.

BOX 10.1 INDEX SCORES AND SUBTESTS OF THE WAIS-IVSA


Verbal Comprehension Index (VCI)
Similarities (core subtest): The test-taker is presented with pairs of
words that represent common objects or concepts and he or she
must indicate how they are similar. The abilities measured include
verbal and abstract reasoning as well as verbal concept formation.
Vocabulary (core subtest): The test-taker names objects that are
presented visually (picture items) and defines words presented visually
and orally (verbal items). The subtest measures language development
and usage, word knowledge, and verbal concept formation.
Information (core subtest): Questions are asked that cover a broad
range of general-knowledge topics. The subtest measures the ability
to acquire, retain, and retrieve general factual knowledge.
Comprehension (supplemental subtest): The test-taker answers questions based
on his or her understanding of social situations and conventional standard of
behaviour. The abilities measured include verbal reasoning and conceptualisation,
verbal comprehension and expression, the ability to evaluate and use past
experience, and the ability to demonstrate practical knowledge and judgement.

Perceptual Reasoning Index (PRI)


Block design (core subtest): There is a specified time limit for this test. The
test-taker views a model and a two-dimensional picture, or a two-dimensional
picture only. He or she must then reproduce the design using three-
dimensional red-and-white blocks. The subtest measures the ability to
analyse and synthesise abstract visual stimuli as well as visual-motor skills.
Matrix reasoning (core subtest): In this subtest the test-taker has to identify
patterns. He or she views an incomplete matrix or series and selects the response
option that completes the matrix or series. The abilities required to master these
tasks include fluid intelligence, broad visual intelligence, classification
and spatial ability, knowledge of part-whole relationships,
simultaneous processing, and perceptual organisation.
Visual puzzles (core subtest): There is a specified time limit for this test. The
test-taker views a completed puzzle and then selects three response options
that can be combined to reconstruct the puzzle. The subtest measures visual
perception and organisation, non-verbal reasoning, spatial visualisation and
manipulation, and the ability to anticipate relationships among parts, including
the ability to analyse and synthesise abstract visual stimuli.
Picture completion (supplemental subtest): Working with a specified time limit,
the test-taker has to identify the important part missing from a picture. The
subtest measures visual perception and organisation as well as concentration.
The visual recognition of essential details of objects is also important.
Figure weights (supplemental subtest): There is a specified time limit
for this test. A scale with missing weight(s) is presented and the test-
taker has to select the response option that keeps the scale balanced.
Quantitative and analogical reasoning as well as working memory is
required. The task furthermore involves inductive and deductive logic.
Working Memory Index (WMI)
Digit span (core subtest): The assessment practitioner reads a sequence of
numbers that the test-taker has to recall in the same order (Digit Span
Forward), or in reverse order (Digit Span Backward) or in ascending order
(Digit Span Sequencing). The shift between tasks requires cognitive flexibility
and mental alertness. This subtest measures attention, rote learning and
memory, working memory, encoding, auditory processing, transformation of
information, mental manipulation, and visuospatial imaging.
Arithmetic (core subtest): The test-taker has to mentally solve
arithmetic problems within a time limit. In addition to numerical-
reasoning ability, concentration, attention, mental alertness, mental
manipulation, and short- and long-term memory are measured.
Letter-numbering sequencing (supplemental subtest): A sequence of numbers
and letters is read to the test-taker, who has to recall the numbers in ascending
order and the letters in alphabetical order. The subtest involves sequential
processing, mental manipulation, attention, concentration, memory span, and
short-term auditory memory.
Processing Speed Index (PSI)
Symbol search (core subtest): Working within a specified time limit, the test-
taker scans a search group and indicates whether one of the symbols in the
target group matches. This subtest measures processing speed, short-term
visual memory, visual-motor coordination, visual discrimination, psychomotor
speed, speed of mental operation, attention, and concentration.
Coding (core subtest): Using a key, the test-taker copies symbols that are paired
with numbers within a specified time limit. The subtest measures processing
speed, short-term visual memory, psychomotor speed, visual perception, visual-
motor coordination, visual scanning ability, and attention and concentration.
Cancellation (supplemental subtest): This subtest has a time limit. The
test-taker scans a structured arrangement of shapes and marks target
shapes. The abilities required include processing speed, visual selective
attention, vigilance, perceptual speed, and visual-motor ability.

BOX 10.2 THE GRIFFITHS III AND THE GCS


The Griffiths III
The Griffiths III is a play-oriented developmental measure for infants and young
children from 1 month to 5 years and 11 months old. The Griffiths III comprises five
separate scales: Foundations of Learning (Subscale A), which assesses aspects of
preschool learning that are foundational for progressing successfully in the early
school grades (e.g. cognitive skills, metacognitive skills, memory, and imaginative
play); Language and Communication (Subscale B), which gives opportunity to study
the growth and development of language as well as cognitive aspects such as
attention, listening, and verbal memory; Eye and Hand Coordination (Subscale C),
which consists of items that assess the fine-motor skills, manual dexterity, and visual
perceptual skills of the child; Personal-Social-Emotional (Subscale D), which
assesses skills related to personal development (e.g. development of self-concept,
independence, and daily living skills), social development (e.g. interacting with
others and the development of humour, friendships, and play skills), and emotional
development (e.g. understanding, expressing and regulating emotions, perspective-
taking, moral development, and theory of mind); and Gross Motor
(Subscale E), which assesses posture, balance, gross body and
visual-spatial coordination, power and strength, rhythm as well as
motor sequencing (Stroud et. al., 2016).
The GCS
The GCS is based on Piagetian theory and was developed
according to Piaget’s stages of development. It consists of five
domains, each measuring a stage in a person’s development.
Section A consists of simple recognition of shapes and colours. Section
B assesses the ability to reconstruct patterns from memory. Section C
assesses the ability to copy a model made by the tester. Section D
assesses the ability to reconstruct from memory once the model has been
removed. Section E assesses the ability to produce and complete patterns.

The administration and scoring procedures of individual intelligence


measures are usually standardised. This means that there is a written
set of instructions that should be followed to ensure a uniform
(standardised) method of administration and scoring. However,
individualised administration also enables the assessment practitioner
to interact with the test-taker. Depending on the reason for referral
(why the individual needs to be tested), individual measures might
therefore be more suitable than group measures. For example, if a
referral is made where underlying emotional problems affect academic
or work performance, the assessment practitioner has to put such a
person at ease and develop rapport to ensure that performance on the
measure reflects his or her ability. Any non-cognitive factors that could
influence performance on the measure, such as anxiety, are
observed, noted, and taken into account when interpreting the results.
Another example is the referral of children with learning problems. If a
child has trouble reading or writing, scholastic aptitude and other
group measures might underestimate his/her ability. In an
individualised situation, test behaviour also provides clues to possible
reasons for the learning difficulties, such as distractibility (see
Chapters 9 and 15).
10.4.1.2 The application of results
Different measures have been developed for different age groups.
Measures with South African norms include the GCS (3 years to 10
years), the JSAIS (3 years to 7 years and 11 months), the SSAIS-R (7
years to 16 years and 11 months), the WAIS-III (16 years to 69 years
and 11 months) and the WAIS-IVSA (16 years to 59 years and 11
months). Please note that although the three indigenous tests for
children continue to be widely used and researched, renorming of
many of these tests is crucial to ensure continued relevance. Original
norming was done approximately 30 years ago, implying that the
norms may be obsolete.
Individual intelligence measures are used for the following
reasons:
to determine an individual’s level of general ability in comparison with
his or her age group
to compare a person’s own performance on the different scales and
indices as well as subtests of a measure in order to gain an
understanding of his or her unique intellectual functioning.
General ability
The raw scores on individual intelligence measures are converted to
standard scores or normalised scaled scores. This implies that an
individual’s performance on a specific test or subtest is expressed in
terms of the performance of a fixed reference group called the norm
group. The individual’s overall performance on the intelligence
measure (the Full Scale score or the Global Scale score), as well as
his or her performance on subscales (e.g. the Verbal Scale score, the
Performance or Non-verbal Scale score and the Numerical Scale
score) and indices (e.g. the Working Memory Index and the
Processing Speed Index), can therefore be compared with the
performance of others in the particular norm group – often specified in
terms of age. This information is used in educational and occupational
counselling, as well as in the selection and placement of students, and
in personnel selection (see Chapter 15). You should keep in mind that
decisions are not based on a single test score. Biographical
information, academic and work performance, and results from other
measures are also taken into account (see Chapter 1). In a survey by
Van der Merwe (2002), organisations that used individual scales as
part of their assessment process indicated that while they used testing
for personnel selection, they also used it for other purposes. The latter
includes the use of psychometric tests for placement, promotion,
transfers and, more importantly, in terms of individual intelligence
measures, for the identification of potential and training needs,
performance counselling, and career development. However, these
measures are more suitable for individual counselling (see below),
and group tests of intelligence and aptitude tests are regarded as
more suitable for general use in industry.
Profile of abilities
Working with standard scores also implies that scores are expressed
in comparable units. In addition to comparison with the norm group, it
is possible to compare a person’s own performance on the different
scales and indices (e.g. to see how the use of words and symbols on
the Verbal Comprehension Index compares with the ability to perceive
visual patterns on the Perceptual Reasoning Index) and in the various
subtests. If someone obtains a relatively higher score for a subtest,
some of the abilities measured by that subtest can be regarded as
strengths. Similarly, relatively lower scores could indicate
weaknesses. (Refer to Box 10.1 for an example of the types of
subtests included in intelligence measures and the abilities measured
by the various tasks. This box contains a description of the subtests of
the WAIS-IVSA.) This information is used to formulate hypotheses
about an individual’s problems, to suggest further assessment, and
even help in planning intervention and remediation. This process
requires experience as well as comprehensive knowledge of the
theory that relates to the abilities measured by a test and the research
that has been done on these abilities (Kaufman, 1979).
Research in the South African context
The following are important considerations when using individual
intelligence measures (as well as the other measures discussed in this
chapter):
the reliability and validity of a measure for a specific context (the
concepts of reliability and validity are explained in Chapters 4 and 5)
• how suitable the measure is for cross-cultural use.
The test manuals of tests standardised in South Africa (including the
GCS, the JSAIS, the ISGSA, the SSAIS-R, the WAIS-III and the WAIS-
IVSA) contain information on reliability and construct validity, and usually
also report on how the results can be used to predict academic
performance. Results are mostly satisfactory for the standardisation
samples. It is also important to consult literature on research in this area.
In Table 10.1 a number of local studies on the use of individual
intelligence tests in different contexts are summarised.
Table 10.1 Local research findings on individual intelligence measures

TEST SCALE/SUBTEST RESULTS


(WHERE RELEVANT)

GMDS General Quotient The level of education of the


(previous Locomotor Scale mother impacted significantly on
version of the performance of a group of
Griffiths III) black South African infants
(Cockcroft, Amod & Soellaart,
2008).

BSID Infants were tested and the


findings implied a moderate
degree of diagnostic accuracy in
identifying children at risk of late
school entry. The demographically
representative sample included
urban South African children and
their families
(Richter, Musawenkosi & Mabaso,
2016).
BSID III Cognitive Scale Significant improvements were
Social-Emotional found after infant exposure to an
Scale early childhood development
programme presented by the
parents (Van Eeden & Van Vuuren,
2017).

GCS This is a valid measure to


determine the influence of socio-
environmental factors on
cognitive development in the case
of school-aged children in rural
KwaZulu-Natal (Ajayi, Matthews,
Taylor, Kvalsvig, Davidson,
Kauchali & Mellins, 2017).

K-ABC Sequential Support was found for the


Processing construct validity of the measure
Simultaneous in an African population
Processing (preschool and school-aged
children) (Jansen & Greenop,
2008).

KABC-II Hand Movements This is a valid measure to


Atlantis determine the influence of socio-
environmental factors on
cognitive development in the case
of school-aged children in rural
KwaZulu-Natal (Ajayi et al., 2017).

JSAIS This measure is valid for the


evaluation of school readiness at
the stage of school entrance
(Robinson & Hanekom, 1991;
Theron, 2013).
Number and This subtest is useful to identify
Quantity Concepts children who are at risk for
underachievement in
mathematics at school entry level
(Roger, 1989).

A characteristic pattern of scores


was identified for learning-
disabled children with reading
problems that distinguishes them
from ‘normal’ children (Robinson,
1986).

The measure correlates with the


GMDS thus making the measure
suitable for early diagnosis of
problems and subsequent
treatment (Luiz & Heimes, 1988).

SSAIS Non-verbal Scale The scale is suitable for assessing


hearing-disabled children
(Badenhorst, 1986; Du Toit, 1986).

SSAIS-R Vocabulary The subtests are useful in


Similarities Story identifying components of
Memory reading ability (Cockcroft &
Blackburn, 2008).

The measure is reliable and valid


for white, coloured and Indian
students although some variations
in factor structure and with regard
to the predictive value of scores
were reported (Van Eeden & Visser,
1992).
WAIS-III Matrix Reasoning The performance of black South
Digit Span Letter- African and predominantly white
Numbering British undergraduate students
Sequencing was compared. Only the subtests
tapping working and fluid
reasoning showed no cross-
cultural effects (Cockcroft,
Alloway, Copello & Milligan, 2015).

Verbal subtests The Afrikaans translation is


suitable for an urbanised sample
with good-quality education
(Grieve & Van Eeden, 2010).

WAIS-IV Reference points were provided


for an educationally
disadvantaged African first-
language population and the
potential advantage of using the
supplemental subtests for this
group was shown (Pienaar,
Shuttleworth-Edwards, Klopper &
Radloff, 2017).

It is also important to take account of the cultural or language group


for which norms were determined. Equivalent individual intelligence
measures have been constructed for South Africans whose home
language is English or Afrikaans (i.e. ISGSA, JSAIS, and SSAIS-R).
Norm groups included Indian, coloured, and white children (Landman,
1988; Robinson, 1989; Van Eeden & Visser, 1992). A slightly modified
version of the WAIS-III has been standardised for English-speaking
South Africans and black respondents were included in the norm
group along with Indians, coloureds, and whites (Claassen, Krynauw,
Paterson & wa ga Mathe, 2001). In the case of the WAIS-IVSA, the
criterion for inclusion in the norm sample was sufficient understanding
of the instructions in English (Shuttleworth-Edwards, 2016; Taylor,
2016). The four population groups were proportionally represented in
the norm sample. Testing a person in a language other than his or her
home language raises questions about the influence of language
proficiency on test performance. Foxcroft and Aston (2006) critically
examined the suitability of the WAIS-III for different cultural and
language groups and concluded that some bias might exist for English
second-language speakers. However, because the language of
education is often English, assessing individuals in an African
language might also disadvantage them (Van den Berg, 1996). Nell
(1999) referred to studies in which African-language speakers scored
lower in both verbal and performance subtests, indicating that more
than just language alone should be considered when determining the
cross-cultural validity of individual intelligence measures.
Demographic variables, especially test-wiseness as determined by
education, urbanisation, and socio-economic level, also affect
performance (Nell, 1999; Shuttleworth-Edwards, Kemp, Rust,
Muirhead, Hartman & Radloff, 2004). Environmental deprivation was
shown to explain part of the variance in performance on the SSAIS-R
(Van Eeden, 1991) and the ISGSA (Robinson, 1994). Consequently,
separate norms for these tests were developed for non-
environmentally disadvantaged children. Although reliability and
validity were similar, substantial differences between the ethnic
subpopulations of the English-speaking norm sample of the WAIS-III
were attributed to quality of education experienced (Claassen et al.,
2001). It is suggested that education, among other variables, should
be considered on a case-to-case basis when interpreting results (refer
to Chapter 15). Research on this issue is discussed further in the
Advanced Reading box. Using the measuring instruments discussed
here in a multicultural context places a great responsibility on the
psychologist in the application of assessment in practice. This
provides further justification for considering alternatives in contexts
where more general indications of ability are required, for example
using competency-based testing in industry.
ADVANCED READING: THE ISSUE OF NORM GROUPS IN A
MULTICULTURAL CONTEXT

The compilation of norm groups for intelligence tests in a multicultural context


is complicated by the influence of moderator variables on test performance.
Ethnic specific norms cannot be considered a solution given the heterogeneity
within ethnic groups on moderator variables such as education (Claassen et
al., 2001). Cross-cultural studies with the WAIS-III and the Wechsler
Intelligence Scale for Children – Fourth Edition (WISC-IV) clearly indicated
the role of quality of education regardless of the subgroup (Shuttleworth-
Edwards, et al., 2004; Shuttleworth-Edwards, Gaylard & Radloff, 2013;
Shuttleworth-Edwards, Van der Merwe, Van Tonder & Radloff, 2013). For
example, within the black-African first-language sample, differences in quality
of education were reflected in terms of performance on these tests.
The debate on the local norming of the WAIS focussed on the use of
population-based versus demographically adjusted norms. Population-based
norms were provided for the English versions of the WAIS-III and the WAIS-IVSA
(Claassen et al., 2001; Taylor, 2016). English-speaking South Africans (not only
English first-language speakers) were included in the norm samples with equal
(in the case of the former) and proportional (in the case of the latter)
representation of the different racial groups. The magnitude of socio-cultural
subgroup differences, however, implies a strong possibility of under- or over-
estimation of an individual’s performance based on subgroup status. A potential
solution is to use the typical performance of particular subgroups of the larger
population as a criterion or standard to obtain additional information on a case-
by-case basis, especially for diagnostic and remedial purposes (Raiford, Weiss,
Rolfhus & Coalson, 2005, 2008). Shuttleworth-Edwards (2016; 2017) regards
strict stratification in terms of level and quality of education as essential in the
local context to prevent the confounding of results for the majority group and
potential misdiagnosis in clinical contexts. She provided guidelines for using
information on the typical performance of a relevant subgroup (referred to as
within-group norms) in interpreting IQ test scores (e.g. Shuttleworth-Edwards,
2012). Sunderaraman, Zahodne and Manly (2016), however, caution that these
reference points are not in place of, but in addition to, population-based norms.
10.4.2 Group tests of intelligence

Every year, large numbers of students who have just finished school apply for
admission to highereducation institutions such as universities and universities of
technology. Government fund these institutions on the basis of the number of
students who pass, so they prefer to admit students who will be successful in
their studies. Admission testing programmes are often used to select students
who are most likely to benefit from what is offered educationally and who are
most likely to meet the criteria of success as defined by each institution.

General intellectual ability is one of the aspects often evaluated in


selection decisions for admission to educational institutions and
training, as well as for placement in industry. Where large numbers
are to be assessed, it is more convenient to use a group measure of
intelligence rather than the individual measures discussed in the
previous section. Furthermore, group tests are designed to assess
cognitive abilities that are relevant to academic achievement. Usually,
a combination of verbal and non-verbal subtests provides a total ability
score. However, some group measures contain only verbal or non-
verbal content. The development of culture-fair measures, for
example, focuses on non-verbal measures of general intelligence.
10.4.2.1 Description and aim
Similar to the position with individual measures, the main
consideration when selecting items for group measures is that each
type of item should provide a good indication of a person’s level of
general intelligence. Variation in item content is necessary to measure
a general factor of intelligence (Spearman’s g). Items in group
intelligence measures commonly measure the ability to solve
problems that involve figures, verbal material (words and sentences),
and numbers. Box 10.3 contains a practice example from the Word
Analogies subtest of the General Scholastic Aptitude Test (GSAT)
Senior Series (Claassen, De Beer, Hugo & Meyer 1991, p. 12). This is
an example of a verbal item. Box 10.4 contains an example of a non-
verbal item, namely a practice example from the Pattern Completion
subtest of the GSAT Senior (Claassen, De Beer, Ferndale, Kotzé, Van
Niekerk, Viljoen & Vosloo, 1991, p. 19; Claassen, De Beer, Hugo et
al., 1991, p. 18).

BOX 10.3 GSAT SENIOR: WORD ANALOGIES

mouse : squeak : frog : ?

We call the sound that a MOUSE makes a SQUEAK and the sound that a
FROG makes a CROAK. MOUSE stands to SQUEAK as FROG stands to
CROAK. CROAK should therefore go in the place of the question mark.
The letter above CROAK is C, therefore the answer to Question 1 is C.
SOURCE: Copyright © by the Human Sciences Research Council. Reproduced by permission.

Items are grouped together according to item type in homogeneous


subtests and the items in each subtest are arranged in increasing
order of difficulty. In the GSAT Senior (Claassen, De Beer, Hugo et
al., 1991), for example, subtests are also grouped into verbal and non-
verbal subscales and scores are provided for these subscales. It is
now possible to see if there is a difference in the individual’s ability to
solve problems with verbal and non-verbal content. The total score is
assumed to represent an underlying general factor of intelligence.
Some group tests provide only a verbal or a non-verbal measure of
general intelligence. In the LPCAT (De Beer, 2000a, 2000b, 2006), for
example, the test-taker analyses sets of figures and deduces basic
relations. This is therefore a measure of non-verbal reasoning ability
and provides an index of an individual’s general intellectual ability
without the influence of verbal skills.

BOX 10.4 GSAT Senior: Pattern Completion


In the first row, in the first square there is a small square with one small
dot, then a slightly bigger square with two dots, and then an even bigger
square with three dots in it. The second row looks exactly like the first,
and the third row also begins like the first two rows. We therefore expect
that this row should look like the other two rows. In the third square we
therefore need a large square with three dots, and that is E.
SOURCE: Copyright © by the Human Sciences Research Council. Reproduced by permission.

Group measures are designed to be administered simultaneously to a


number of persons. However, the size of the group should be limited
to ensure proper supervision and rapport. According to Claassen, De
Beer, Hugo et al. (1991), an assessment practitioner needs assistance
if a group exceeds 30 persons. The test items are often printed and
require only simple responses or are of a multiple-choice format (e.g.
the GSAT). Answers are recorded in a test booklet or on an answer
sheet. Computer-based assessment (e.g. the LPCAT) has the
advantage that modern psychometric methods can be used. It also
saves time in terms of administration and feedback. In all instances,
practice examples are given for each type of item. After completion of
the practice examples the test-taker continues with the test or subtest
on his or her own. An advantage of group measures is that the scoring
can be done objectively and quickly (often by computer).
The administration procedure is standardised, and the assessment
practitioner is only required to read the instructions and make sure
that time limits are adhered to. The testing conditions are more
uniform than in the case of individual measures, but the opportunity for
interaction between the assessment practitioner and test-takers is
limited. We mentioned that some individuals might score below their
ability owing to anxiety, motivational problems, and so on; these non-
cognitive factors that could influence test performance are more
difficult to detect in a group situation. In addition, group measures
assume that the people being tested have normal reading ability. If an
individual has reading problems, his or her score in a group measure
should not be regarded as a valid estimate of his or her ability. It
should be noted that group measures can also be administered to
individuals.
10.4.2.2 The application of results
General ability
Group measures are used at different stages in the educational and
career-counselling process. The range of difficulty covered by a
measure should be suitable for the particular age, grade, or level of
education for which it is intended. Different forms of a measure for
different ability levels normally provide continuity in terms of content
and the nature of the abilities measured. Therefore, it is possible to
retest a person and compare his or her score over several years.
Different age groups or grade groups can also be compared. For
example, the GSAT has a Junior Series (9 years to 11 years), an
Intermediate Series (11 years to 14 years), and a Senior Series (14
years to 18 years) (Owen & Taljaard, 1996). The Paper and Pencil
Games (PPG) can be used for younger children (Grades 2 to 4)
(Claassen, 1996). The LPCAT can be administered to respondents
who are 11 years and older. The test is applicable at different
educational levels ranging from low-literate to tertiary/university levels
(De Beer, 2013).
The scores on group measures of general ability are expressed as
standard scores and an individual’s performance can be compared
with that of a reference group (the norm group). This gives an
indication of the individual’s level of reasoning ability or problem-
solving ability. Because this information is relevant to his or her
academic achievement, these measures can be used for educational
counselling, as well as for the selection and placement of students at
school and at tertiary institutions. For example, students who could
benefit from special education (gifted individuals or those with learning
problems) can be identified. Although further testing, for instance with
individual intelligence measures, would be necessary to obtain a
differential picture of abilities, this initial screening helps to limit the
number of persons involved. Group measures can also be used as
part of the selection process in industry.
Research in the South African context
Similar to individual intelligence measures, the content of group
measures is culture-specific, and knowledge of and familiarity with the
material could influence test results. Claassen and Schepers (1990)
found that mean differences on the GSAT Intermediate for different
population groups were related to differences in socio-economic
background. In addition, test-wiseness (i.e. being accustomed to
testing) may play a somewhat greater role than in the case of
individual measures (Anastasi & Urbina, 1997). Claassen, De Beer,
Hugo et al. (1991) consequently preferred the term ‘scholastic
aptitude’ to ‘general intelligence’ and these authors maintained that
the GSAT measures ‘a student’s present level of reasoning ability in
respect of scholastic material’ (p. 25). Only in the case of students
who are not environmentally disadvantaged can this score also be
interpreted as an intelligence score.

❱❱ CRITICAL THINKING CHALLENGE 10.2

Consider the comment by Claassen, De Beer, Hugo et al. (1991, p. 25) above – that
a group intelligence test measures ‘a student’s present level of reasoning ability in
respect of scholastic material’ – to which a further comment is added that a group
intelligence test score can only be interpreted as an intelligence score for test-takers
who do not come from environmentally disadvantaged contexts. Would
these comments also apply to individual intelligence test scores?
Reread Section 10.4.1.2 to help you.

Lack of knowledge of the language used also influences test results if


a person is not assessed in his or her mother tongue. This effect
seems less significant in the case of non-verbal measures (Claassen
& Hugo, 1993; Hugo & Claassen, 1991). In the case of the PPG,
instructions are presented in the mother tongue (Claassen, 1996). The
instructions for the LPCAT have also been translated into the 11
official South African languages. According to De Beer (2006), the
LPCAT makes provision for differences between culture groups,
among other reasons, due to its non-verbal figural item content. It is
often assumed that non-verbal measures cannot be used as
successfully as verbal scores to predict academic achievement.
However, De Beer (2006) found that the predictive validity of the
LPCAT is generally acceptable at school and at junior tertiary levels
with lower correlations with academic performance found for
postgraduate students. The construct validity of the measure is
acceptable irrespective of age or attained level of education. Similarly,
Schaap (2011) showed that, although the nonverbal PiB/SpEEx
Observance test is not completely language insensitive, it can be
regarded as a suitable language reduced option in a multilingual
context. The Raven Progressive Matrices (RPM) is a non-verbal
measure regarded as suitable for cross-cultural use. South African
norms on this test are available for various subgroups (JvR Africa
Group, 2017). Cockcroft and Israel (2011) found support for the use of
the Raven’s Advanced Progressive Matrices in a multicultural,
multilingual context. Using gender and home language as stratification
variables in a group of university students, correlations with verbal
tests supported the convergent validity of the matrices across the
subgroups. Owen (1992b) compared the performance of white,
coloured, Indian, and black students in Grade 9 for the standard form
of the RPM. He found that the measure ‘behaves in the same way for
the different groups’, but that large mean differences between the
white and black groups in particular imply that common norms cannot
be used (p. 158). The author suggested that children from the different
groups do not necessarily use the same solution strategies. This
finding relates to the warning by Cockcroft et al. (2015) that non-verbal
tests are not necessarily more suitable than verbal tests in a
multicultural context. It will depend on the knowledge and skills
required by the test. If the same set of test norms is used for different
groups, it is important to establish whether the same construct is
measured and if a common regression line predicts academic
achievement equally well for each group (see, for example,
Claassen’s (1990) study on the comparability of GSAT scores for
white, coloured, and Indian students).

❱❱ ETHICAL DILEMMA CASE STUDY 10.1

Thabo is a 10-year-old isiXhosa-speaking child in Grade 2. He attends school in a


rural village near the Lesotho border. He is not progressing well at school as he
struggles with reading, so his teacher refers him for an assessment. The
psychologist has only been trained to use the Senior South African Intelligence
Scales – Revised (SSAIS-R), so he administers it to Thabo. The results indicate
that Thabo’s intellectual ability is below average. The psychologist then reports to
the teacher that the reason why Thabo battles to read is because his cognitive
ability is below average. He recommends that the teacher needs to lower her
expectations of what Thabo is capable of achieving.

Questions:
(a) Was the psychologist acting in an ethically responsible way in
administering the SSAIS-R to Thabo and in reaching the
conclusions that he did? Motivate your answer.
(b) Advise the psychologist how he or she can avoid transgressing
ethically in a similar situation in the future.

10.5 Measures of specific abilities


10.5.1 Aptitude measures

Consider the following scenario:


A major car manufacturer appoints people who have completed a number
of years of schooling as apprentices. They are then trained to become motor
mechanics. When selecting these apprentices, level of general intelligence,
academic achievement, previous experience, etc. are taken into account. In
addition, an apprentice should have above-average mechanical aptitude.

What is meant by the term ‘mechanical aptitude’ and is this similar or


different to ‘general intelligence’? The term aptitude is used to
describe a specific ability. It refers to the individual’s ability to acquire,
with training, a specific skill or to attain a specific level of performance.
According to Owen (1996), the abilities measured by aptitude
measures correspond to the intellectual abilities measured by tests of
general ability. The difference is that the items and subtests of
measures of general cognitive functioning are selected primarily to
provide a unitary measure, and not to provide an indication of
differential abilities. In addition, certain areas of functioning relevant to
guidance in educational and occupational contexts are not covered by
measures of general cognitive functioning (e.g. mechanical and
clerical skills).
10.5.1.1 Description and aim
The abilities measured by different aptitude measures include
reasoning on the basis of verbal, non-verbal, and quantitative material,
language comprehension, spatial ability, perceptual speed, memory,
coordination, and so on. Tasks that measure the same ability are
grouped together in homogeneous tests. For example, each item in
the Mechanical Insight test of the Differential Aptitude Tests Form R
(DAT-R) consists of a drawing representing a mechanical apparatus
or portrays a mechanical or physical principle. This test measures
mechanical ability or the ability to perceive mechanical principles and
to understand the functioning of apparatus based on these principles
(Claassen, Van Heerden, Vosloo & Wheeler, 2000). A special aptitude
test focuses on a single aptitude. There are a variety of special
aptitude tests that measure a comprehensive range of work skills at
different levels. In Table 10.2 the SHL Verify range of ability tests
(CEB, 2013−2016) illustrates some aptitudes that are typically
measured in organisational contexts. These tests are used in
combination to assess abilities relevant to specific jobs and job levels
(as seen in Table 10.2).
Table 10.2 The SHL Verify range of ability tests and examples of suitable combinations

VERIFY RANGE DIRECTORS AND INFORMATION


SENIOR MANAGERS TECHNOLOGY STAFF

Numerical Reasoning ✔ ✔

Verbal Reasoning ✔ ✔

Inductive Reasoning ✔

Deductive Reasoning ✔ ✔

Mechanical ✔
Comprehension

Spatial Ability ✔ ✔

Checking ✔

Calculation

Reading
Comprehension

A multiple-aptitude battery, on the other hand, consists of a


number of pre-combined homogeneous tests, each of which
measures a specific ability. The homogeneous tests are, in certain
instances, also grouped to obtain measurements in more general
aptitude fields. For example, the nine tests of the Differential Aptitude
Tests Form K (DAT-K) are grouped into six aptitude categories that
can be more easily related to specific occupations than the individual
tests (see Table 10.3) (Coetzee & Vosloo, 2000, p. 44). Similar to the
DAT-K, the Verbatim & Numeratum (V&N) (JvR, 2017) consists of
scales grouped into categories to measure general verbal and
numerical reasoning ability. The Psytech SA (2017) batteries are also
multiple-aptitude batteries that were designed in terms of a
recommended minimum level of education. The Technical Test battery
(TTB2) and the Clerical Test Battery (CTB2) are examples of these.
The administration and scoring procedures are similar to those for
group measures, and the same advantages and limitations apply.
Although most items are multiple-choice items or require only simple
responses, tests such as those measuring coordination and writing
speed involve more complex responding and scoring procedures. In
some instances the tests are administered in paper-and-pencil or
online format, and the scoring and reporting are done online (e.g.
many of the Psytech batteries and the DAT versions). The SHL Verify
range of ability tests, on the other hand, is only available online. An
aptitude test can also be administered as an individual test. For
example, we can administer the Aptitude Tests for School Beginners
(ASB) individually in a context where it is often helpful to observe the
child’s behaviour, work methods, and other non-cognitive factors that
could influence performance (Olivier & Swart, 1974).
Table 10.3 Grouping of the tests of the DAT-K in aptitude fields

APTITUDE TESTS

Verbal aptitude Vocabulary


Verbal reasoning
Reading comprehension
Memory (Paragraph)

Numerical aptitude Calculations


Visual-spatial reasoning Non-verbal reasoning: Figures
Spatial visualisation 3D

Clerical aptitude Comparison

Memory Memory (Paragraph)

Mechanical aptitude Mechanical insight

10.5.1.2 The application of results


The scores for each measure in a multiple aptitude battery are
converted to standard scores, so the individual now has a set of
scores for different aptitudes. His or her profile of aptitudes can be
compared with that of others in the norm group, or the individual’s
strengths and weaknesses can be identified. As indicated there is an
aptitude test for school beginners. The DAT is primarily used in an
educational context and it consists of a standard form (Form R) and
an advanced form (Form S) for Grades 7 to 10 as well as a standard
form (Form K) and an advanced form (Form L) for Grades 10 to 12. As
discussed, specific aptitude tests and multiple-aptitude batteries used
in industry are also often graded in terms of a minimum educational
level and/or a suitable job level (or levels).
Educational context
The ASB (HSRC, 1994) measures level of development with regard to
a number of aptitudes that are important to progress at school. This
test battery is used to determine if children are ready to enter school.
It aims to obtain a comprehensive picture of specific aptitudes of the
school beginner, and evaluates the cognitive aspects of school
readiness. The ASB comprises eight tests: Visual Perception, Spatial
Orientation, Reasoning, Numerical, Gestalt, Coordination, Memory,
and Verbal Comprehension. The ASB is an educationally focused
screening measure that, similarly to measures designed to assess
global developmental progress and delays, are used to identify
children who are potentially at risk for developmental difficulties.
These children are then further assessed using diagnostic measures
(see Box 10.2).The aptitude profile on the DAT is used together with
results from other tests and information on academic performance for
guidance on decisions regarding which type of secondary school
children should attend (e.g. academic or technical), subject choice,
and the field of study after school. Combinations of aptitudes are
considered when making these decisions and conclusions are not
based on a single score. For example, if a boy in Grade 12 scores
above average on Numerical Aptitude and Visual-spatial Reasoning
on the DAT-K, he could consider a technical or scientific field of study
or occupation depending on his performance in mathematics at
school.
Industrial context
Different aptitudes and aptitude fields are also important for success in
different occupations. A number of aptitude tests and multiple-aptitude
batteries have been developed to aid in selection and placement in
industry. For example, the combination of Verify tests for information
technology staff that is proposed in Table 10.2 is suitable for the
selection, development, and promotion of staff working in information
technology such as software engineers, system analysts,
programmers, and database administrators. Some support for the
predictive validity of the Basic Checking and Audio Checking ability
tests in the case of call-centre operators was reported by Nicholls,
Viviers, and Visser (2009). However, the level of skill that a person
may reach also depends on his or her general intellectual ability,
interests, personality traits, attitude, motivation, and training.
Research in the South African context
When aptitude profiles are compared, it is assumed that everyone
being assessed has more or less the same level of experience with
regard to the different aptitudes. This is not necessarily true in a
context where people’s backgrounds might not be the same. The
composition of the norm group with regard to language and culture
should, therefore, be taken into account when interpreting results. In
1994, norms were established on the ASB for all South Africans
(HSRC, 1999/2000). Instructions have also been translated into
numerous languages (Northern Sotho, Tswana, Xhosa, and so on).
The DAT is available in English and Afrikaans. Some forms of the
DAT have been standardised for all population groups and continued
research has resulted in, amongst others, a large-scale
standardisation of the DAT-L in Namibia.
In the case of the tests referred to that are used in an industrial
context, cross-cultural research and norming for samples
representative of the South African population are taking place on a
continuous basis. Dowdeswell and Joubert (2014), for example,
reported that there was limited evidence of cheating on Verify aptitude
tests in a mixed sample of South Africa job applicants. The Verify
format provides for the online (unsupervised) administration of the
tests and a supervised verification administration of the same tests,
thus enabling the user to determine the prevalence of cheating.

10.5.2 Measures of specific cognitive functions

Consider the following scenario:


Jane, a 25-year-old woman, was involved in a motor accident. She
complains that she is not functioning as well at home or at work as she used
to before the accident, and she is referred for neuropsychological
assessment. The clinician needs to determine which cognitive functions have
been affected by the accident and to what extent this will help him or her to
explain Jane’s cognitive and behavioural deficits, and to plan rehabilitation.

Would an individual intelligence test be included in a test battery to


assess Jane’s functioning after the accident? From Box 10.1 it should
be clear that the subtests of an individual intelligence test measure not
only general ability but also specific cognitive functions. Hypotheses
about areas in which an individual might be experiencing problems (or
in which he or she is expected to do well) can therefore be based on
performance in combinations of subtests. So, in answer to the
question posed at the start of this paragraph, it is possible to use the
subtests of individual intelligence tests for psychodiagnostic purposes.
In this regard, Shuttleworth-Edwards (2012, 2016) argues in favour of
the diagnostic use of the Wechsler Scales in neuropsychological
practice and explains how a protocol obtained on this test can be used
in a cross-cultural context. To ensure a match between the individual
and the comparison group, fine levels of stratification are essential in
compiling the comparison group. Tests constructed to measure
particular functions can be used on their own or to explore the
hypotheses derived from individual intelligence tests regarding
strengths and weakness.
Tests for specific cognitive functions can be grouped in a number
of categories. The scope of this textbook does not allow for a
comprehensive discussion of all the categories. Only measures of
verbal functions and language skills, perceptual functions, spatial and
manipulatory ability, and motor performance are briefly discussed in
this section. The following functions are not covered: orientation;
attention, concentration and tracking; memory; concept formation and
reasoning; and executive functions.
10.5.2.1 Description and aim
The aspects measured by measures of verbal functions and
language skills include speech, reading, writing, and comprehension
of spoken language. With less serious problems, standardised
language-related scholastic achievement measures that are discussed
in the next section can be used to obtain information about the nature
of a problem and to recommend steps for remediation. In the case of
language dysfunctions caused by brain damage (called aphasia), a
more comprehensive assessment by a specialist is necessary. The
Token Test is an example of a screening measure for aphasia (Lezak,
Howieson & Loring, 2004). The test material consists of tokens –
shapes (circles and squares) in two sizes (large and small) and five
colours. The test items require the test-taker to follow relatively simple
instructions such as: ‘Touch the yellow circle with the red square’ (this
is not an actual item). To do this, the test-taker must understand the
token names and must also know the meaning of the verbs and
prepositions in the instructions. The Token Test is sensitive to
deviations in language performance related to aphasia. However,
once a person is known to be aphasic, a more comprehensive
diagnostic measure or test battery should be administered.
Perceptual measures deal with visual, auditory, tactile and
olfactory functions – that is, vision, hearing, touch and smell. For
example, line bisection tests provide an indication of unilateral visual
inattention, in other words, failure to perceive individual stimulation
(Lezak et al., 2004). One or more horizontal lines are presented and
the test-taker is asked to divide the line by placing a mark at the
centre of the line. Cancellation tasks are also commonly used to test
for visual inattention. Many techniques are available for the
assessment of the verbal auditory functions but only a few measures
of non-verbal auditory functions are available (Lezak et al., 2004). The
face-hand tests and the finger recognition tests are examples of
measures of tactile functions (Gregory, 2010). These measures
consist of simple tasks: in the first type of measure the assessment
practitioner touches the subject on the hand and/or the cheek and the
subject must then indicate where he or she felt the touch. Finger
recognition tests require identification (pointing or verbally) of the
finger or set of fingers touched. In both these types of measures a
series of touches is presented while the test-taker’s eyes are either
open or closed, depending on the measure used.
Measures of constructional performance assess a combination
of perceptual functions (more specifically, visuoperceptual functions),
spatial ability, and manipulatory ability or motor responses (Lezak et
al., 2004). The activities included in these measures involve either
drawing (copying or free drawing) or assembling (two-dimensional or
three-dimensional construction). The Bender-Gestalt Test is a widely
used drawing test (Anastasi & Urbina, 1997). It consists of relatively
simple designs or figures that are presented to the test-taker one at a
time. He or she has to copy each design as accurately as possible. To
identify specific deficits, the administrator needs to observe how the
test-taker executes the drawing task and what type of errors he or she
makes (Lezak et al., 2004). For example, it is noted whether a figure
was rotated, if a hand tremor is evident, or if separate designs overlap.
If a measure that involves assembling tasks (see the Block Design
core subtest of the WAIS-IVSA described in Box 10.1) is also included
as part of the evaluation process, the assessment practitioner will be
able to discriminate better between the visual and spatial components
of constructional problems.
Measures of motor performance assess manipulative speed and
accuracy (Lezak et al., 2004) among other things. The Purdue
Pegboard Test was developed to measure dexterity when selecting
employees for manual jobs such as assembly, packing, and operation
of certain machines. The test-taker has to place a number of pegs into
holes with the preferred hand, then the other hand, and then both
hands simultaneously. The score is the number of pegs placed
correctly in a limited time (30 seconds for each trial). In the Grooved
Pegboard test each peg has a ridge along one side and the subject
has to rotate the peg in order to fit it into a slotted hole. This test
measures both manual dexterity and complex coordination.
Measures of specific cognitive functions are administered
individually and qualitative interpretation of responses and test
behaviour are important sources of information. The assessment
practitioner should have the experience to accommodate test-takers
with mental and/or physical handicaps. In addition, there are a large
number of measures available for measuring the different cognitive
functions. The assessment practitioner should study the test manuals
and other relevant literature on the administration and scoring
procedures for a specific measure.
10.5.2.2 The application of results
Diagnostic
Depending on the reasons for referral, the assessment practitioner
can use a combination of relevant tests to measure a number of
different cognitive functions. This is an individualised approach and
implies a different set of tests for each client. Alternatively, the
assessment practitioner can use a standardised test battery, which
consists of a fixed set of subtests, each measuring a specific function
or an aspect of a function. By using the same set of tests, the
assessment practitioner can compare the results of different persons.
Measures of specific cognitive functions are used in
neuropsychological assessment to detect brain impairment and to
determine which functions have been affected, and to what extent.
The specific aspect of a function with which a person is experiencing
problems can also help to identify and localise the impaired brain
area. A person’s score on a particular measure is evaluated in terms
of research information available for that measure. This will show how
his or her performance compares with what can be expected from a
‘normal’ person, and possible brain damage can be identified. A
qualitative analysis of responses also provides more information on
the nature of the problem. An example is the inability of brain-
damaged individuals to process as much information at once as
normal individuals can. This can be seen in brain-damaged
individuals’ disorganised approach to copying complex figures. As a
result of the diversity of disorders and the corresponding behavioural
manifestations, experience is required to identify problems and plan
rehabilitation. Therefore, a team of specialists is usually involved in
the assessment process and various factors other than the test scores
(e.g. medical history, interview information) are also taken into
account (see also Chapter 15).
Research in the South African context
As yet, many of the measures referred to in this section have not been
adapted or standardised for South Africa. A disconcerting trend is the
consistently lower performance on neuropsychological tests when
comparing South African subgroups to the original norm groups
(Lucas, 2013). Research on the performance of local populations on
these tests results in the availability of more relevant comparative data
for individuals who form part of these populations (e.g. Gadd &
Phipps, 2012; Skuy, Schutte, Fridjhon & O’Carroll, 2001; Van Wijk &
Meintjies, 2015a, 2015b). In addition, the development of local tests
such as the Groote Schuur Neurocognitive Battery should receive
greater support (Mosdell, Balchin & Ameen, 2010).
10.6 Scholastic tests
Although the focus of this book is on psychological assessment
measures, scholastic measures are often used in conjunction with
psychological measures in educational settings. Standardised
scholastic tests measure current attainment in school subjects or
fields of study. There are three categories of scholastic tests, namely
proficiency tests, achievement tests, and diagnostic tests. Proficiency
tests overlap with aptitude tests, but whereas aptitude tests measure a
number of cognitive abilities, proficiency tests are more focused in
terms of defined fields of study or school subjects (e.g. English-
language proficiency test for a specific school grade). Proficiency tests
are not limited to syllabus content. On the other hand, achievement
tests are syllabus-based and the items of these tests should cover
syllabus content (e.g. mathematics). In addition, diagnostic tests
measure performance in subsections of a subject domain. This
enables the assessment practitioner to determine specific areas of a
subject where a child experiences problems. The use of scholastic
tests are not limited to the school context, for example the English
Literacy Skills Assessment (ELSA) is suitable for use in work and
training environments.

Current and new trends in


10.7
cognitive assessment
A number of new trends are gaining traction in the domain of cognitive
assessment. Modern test theory, for instance IRT and Rasch analysis
methods, has distinct advantages for psychometric standards and is
the basis for CAT. Other exciting new developments are the use of
games and technology in cognitive assessment (Delgado, Uribe,
Alonso & Diaz, 2016) and ‘on the fly’ Automated Item Generation
(AIG) methods. The latter involves the assembling of items based on
real-time, rule-based models (Dai, Zhang & Li, 2016). This approach
offers cost reduction for item development, improved test security, and
theoretical improvements in test development. AIG uses cognitive and
psychometric modelling and computer technology to generate items
using measurement principles and cloning models (Geerlings, Glas &
Van der Linden, 2011). New techniques for evaluating the
psychometric quality of these items have also been developed (Gierl,
Lia, Pugh, Touchie, Boulais & de Champlain, 2016). These
developments will expand the utility of cognitive assessment and allow
for the development of new technology and theories to positively
impact the quality of cognitive assessment that can be offered. Also
refer to Chapters 6 and 14 in this regard.
Use of mobile technology is another new trend in assessment.
According to Arthur, Doverspike, Munˇoz, Taylor, and Carr (2014),
with increasing technological developments and more assessments
conducted on mobile, hand-held small-screen devices such as
smartphones, tablets or laptops, there is a need to investigate the
equivalence of assessment results for various administration modes.
Measurement equivalence has been found with regard to the
constructs measured, but while scores for personality measurement
were similar for mobile and non-mobile devices, for cognitive
assessment performance, scores on mobile devices were
substantially lower than for non-mobile devices. Similar results were
found by Morelli, Mahan, and Illingworth (2014) with means on
cognitive scores half a standard deviation lower for those tested on
mobile devices compared to those tested on non-mobile devices.
These results show that it is not prudent to simply assume that
measurement across different modes will be similar, and research is
required to investigate the similarities and differences of results
obtained by means of different technological devices.
In the South African context, the Africanisation of stimulus material
for the measurement of cognitive ability holds promise for the
development of cognitive assessment measures that should be more
familiar to the majority of test-takers (Bekwa, 2016). In addition, the
possibility of developing parallel versions of indigenous cognitive
measures for different cultural or language groups, where some of the
content differs to allow for greater familiarity while still tapping the
same construct, is gaining momentum. For example, Malda, Van de
Vijver, and Temane (2010) developed parallel versions of cognitive
tests for Afrikaans- and Tswana-speaking cultural groups. In a short-
term memory test, the Afrikaans version included words such as
‘computer’ and ‘camera’, while the Tswana version included words
such as ‘tuck shop’ and ‘braids’. Culturally sensitive content was found
to have a positive impact on test performance, especially for more
complex cognitive tasks (Malda, Van de Vijver & Temane, 2010).
Contextual approaches to assessment are furthermore
recommended to ensure fairness in multicultural settings (Laher &
Cockcroft, 2017). Theron (2013) illustrated the use of such an
approach in assessing school readiness. Qualitative indicators are
used in addition to quantitative scores to obtain a comprehensive
picture of a child’s intellectual readiness as well as his or her
emotional, social, and physical readiness. This information implies
greater sensitivity to contextual factors (including culture), and can be
used to develop appropriate interventions.

10.8 Conclusion
To understand why the theories and measurement of cognitive
functioning have sparked such intense debate and emotional
reactions, we have to consider the psychosocial issues involved. In
most Western societies, either directly or indirectly, various
‘advantages’ are associated with a specific level of cognitive
functioning – ‘intelligence’ as it is defined in psychometric terms.
These include aspects such as a good education, a corresponding
higher level of employment, higher standing in the society, a higher
level of income, and so the list goes on. At a country level, IQ has
been shown to correlate positively with the Gross Domestic Product
(GDP) of countries, attained education of countries as well as other
social and economic factors (Roivainen, 2012). Furthermore, socio-
economic status (SES) is positively associated with both cognitive
performance and academic achievement of young children and
adolescents (Von Stumm, 2012). However, after studying this chapter,
you should realise that measures of cognitive functioning differ in
terms of the underlying assumptions. The interpretation of results is
limited to what each test measures and should always be used in the
context for which a specific measure was intended. Technical
features, such as the reliability and validity of the measure, should be
considered, as well as the population or norm group for whom the
measure was standardised. In addition, research on differences within
and between groups should make us aware of the need to invest in
people through social or educational programmes.

CHECK YOUR PROGRESS 10.1


10.1 Critically discuss the meaning of an intelligence score.
10.2 Discuss the different aspects that need to be considered when
investigating the cross-cultural equivalence of measures.
10.3 Discuss the use and application of measures of general
cognitive functioning.
10.4 Discuss the use and application of measures of specific abilities.
10.5 What factors will you consider when selecting a cognitive
measure to be used in a specific situation?
10.6 You are working in the human resources department of a mining
company that has an active personnel development programme. Your
job is to identify individuals with the potential to benefit from an
administrative training course, a language course, and/or a technical
course. What type of measure(s) would you use? Justify your answer.

When you have finished responding to the questions, it will be time to


continue your travels through the Types of Measures Zone. At the
next stop, you will have an opportunity to investigate measures of
affective functioning.
* The role of race is by no means static. However, in the South African context it remains a
factor in the categorisation of people in terms of opportunities and resources. As such, the
related classifications were used.
Measures of well-being
CAROLINA HENN
Chapter CHAPTER OUTCOMES
11 This stop follows a similar pattern to the first two stops in the Types of Measures
Zone. In what way is the pattern similar? You will first be provided with some
theoretical perspectives and issues, after which you will be introduced to
various measures.
After completing this chapter, you will be able to:
› understand well-being as a broad concept within psychology
› understand the impact of well-being in the world of work
› name and describe the assessment tools that can be utilised to measure
various constructs related to well-being.

11.1 Introduction
In this chapter you will be introduced to the concept of ‘well-being’ in
general as well as the theoretical underpinnings of well-being. You will
also be introduced to measures that can be used to assess well-being,
and its subdimensions.

11.2 Well-being
The overriding impression one gets from reading the available
literature on well-being is of the vastness of the concept. Different
authors provide different definitions, and also emphasise different
aspects of well-being. Roodt (1991) attributes this to the numerous
disciplines that investigate well-being. These disciplines include
psychology, sociology, medicine, and economics. He also points out
that the use of numerous terms (including well-being, psychological
well-being, life satisfaction, happiness, and quality of life) further
complicate understanding of well-being. There does seem to be
consensus, however, that well-being is a multidimensional construct
with numerous domains. Moreover, these different domains exercise
mutual influence on each other. In terms of schools of thought within
psychology, well-being is most at home within positive psychology.
In the following sections attention will be given to providing a fairly
inclusive definition of well-being, explaining its link to positive
psychology, and discussing some of the domains of well-being.

11.2.1 Defining well-being


The World Health Organisation (WHO) defines health as ‘a state of
complete physical, mental, and social well-being and not merely the
absence of disease or infirmity’ (Preamble to the Constitution of the
World Health Organisation, p. 100). It is interesting to note that this
definition has remained unchanged since 1948, when it was first
entered into force. In addition, the WHO also provides a definition of
mental health, namely ‘a state of well-being in which every individual
realises his or her own potential, can cope with the normal stresses of
life, can work productively and fruitfully, and is able to make a
contribution to her or his community’ (WHO, 2008: Online Question
and Answer: What is mental health? para.1). This definition
encompasses both subjective elements (i.e. within the individual),
such as coping with stress, as well as objective elements (i.e. between
the individual and his environment), such as making a contribution to
the community.
Literature on well-being supports the above definitions by focusing
on two key elements of well-being, namely an absence of
disease/illness on the one hand, and the presence of physical and
emotional health on the other. In many writings the term ‘psychological
well-being’ is utilised to refer to this absence of negative aspects such
as illness, and the presence of positive factors such as happiness.
Despite the ‘psychological’ prefix, the term is still applied in a broad
sense that encompasses both emotional and physical aspects.
Diener, Kesebir, and Lucas (2008) highlight that well-being refers
to the full range of aspects that contribute to an individual’s
assessment of his or her quality of life, including social aspects,
physical and mental health, and feelings of happiness and safety. In
addition, they distinguish between non-specific well-being indicators
such as life satisfaction, and more specific ones such as job
satisfaction. Blanchflower and Oswald (2002) refer to the former as
‘context-free’ well-being, and the latter as ‘context-specific’ well-being
(p.1360). A person may thus rate his or her overall well-being as
positive, but single out a specific aspect (e.g. physical fitness) as
negative.

11.2.2 Positive psychology and well-being


It is most probably the second element of well-being, namely the
presence of positive aspects of functioning, that places it within the
domain of positive psychology. Positive psychology emphasises that
psychology is not merely about disease and finding cures, but also
about positive aspects of living such as happiness and virtue
(Seligman, 2003). Keyes and Haidt (2003) point out that positive
psychology is about helping people to not merely exist, but to flourish.
However, they are also quick to mention that positive psychology’s
view of life and people is realistic and balanced. In other words, they
do not deny the existence of challenging and daunting life
circumstances and negative affect such as sadness.
This view is also expressed by Peterson (2006) who states quite
succinctly that ‘everyone’s life has peaks and valleys, and positive
psychology does not deny the valleys’ (p. 4). The focus remains,
though, on the fact that people are generally resilient and manage to
cope with life’s difficulties, sometimes in ways that astound those
around them. Ryff and Singer (2003) refer to the many well-
documented empirical findings of individuals who were able to
preserve and even improve their well-being, despite difficult
challenges. One is reminded, for example, of South Africa’s
paralympic gold medallist swimmer, Natalie du Toit, who lost a leg in a
motorcycle accident at a time when she was already a swimmer with
extremely good prospects. Despite this loss and obvious adaptations
she had to make, she continues to aspire to ever greater
achievements. By doing so, she serves as a source of inspiration for
numerous other people.

11.2.3 Domains of well-being


Deci and Ryan (2008) indicate that research on well-being generally
falls within one of two traditions: the hedonistic or eudaimonic
tradition. The hedonistic view on well-being focuses on subjective
experiences of well-being, such as happiness, and positive and
negative affect. The eudaimonic perspective is concerned with
maximising one’s full potential, and living ‘as one was inherently
intended to live’ (Deci & Ryan, 2008, p. 2). In addition, there are also
other fields of study, such as economics, that evaluate well-being in
terms of economic indicators such as employment and income.
11.2.3.1 Subjective well-being
Discussions on the experience of subjective well-being (SWB) are
best placed within the hedonistic tradition. Diener, Sapyta, and Suh
(1998) define SWB as an individual’s evaluation of his or her quality of
life. This evaluation involves multiple constructs, though (Diener,
Scollon & Lucas, 2009) and can be in terms of cognitive judgements
(e.g. marital satisfaction) as well as affect (e.g. the presence of
positive emotions and the absence of negative emotions). There is an
abundance of literature on various determinants of SWB. A few of
these determinants that feature prominently, are:
Quality of Life (QoL)
According to Roodt (1991) quality of life can be determined in two
ways. Firstly, social indicators such as crime and unemployment rates
can be utilised to judge quality of life. Secondly, quality of life is
determined by individuals’ subjective perceptions of different aspects
of their lives. He also ranks QoL as an overarching concept that
encompasses psychological well-being. It is interesting to note that,
although QoL is generally considered to be a subjective indicator of
well-being, it does contain an objective component as well, which is
expressed in, among others, social factors such as neighbourhood
and crime. Ring, Hofer, McGee, Hickey and O’Boyle (2007) point out
that empirical findings strongly indicate the specific nature of QoL (in
other words, there is much variation in terms of individuals’
perceptions thereof).
Happiness
In some writings, happiness is used interchangeably with subjective
well-being (cf. Deci & Ryan, 2008), whereas Roodt (1991) considers
happiness to be a subcomponent of psychological well-being.
Veenhoven (1991) defines happiness as ‘the degree to which an
individual judges the overall quality of his life-as-a-whole favourably’
(p. 1). He further proposes that this global appraisal consists of an
affective component (which refers to the degree to which emotions
experienced are considered pleasant), and a cognitive component
(which is determined by a person’s perceptions of having his or her
expectations met). Argyle and Crossland (1987) identified three key
components of happiness, namely a high frequency of positive affect,
the absence of negative feelings, and a high average level of
satisfaction over time. Diener et al. (2008) present the following
empirical findings on the benefits of happiness: improved
relationships, higher income, improved health, increased behaviours
that benefit the community such as volunteering, and so on. In short,
according to them, ‘happiness fosters effective functioning across a
variety of domains’ (p. 44).
Life satisfaction
According to Bekhet, Zauszniewski, and Nakhla (2008), life
satisfaction is ‘similar to happiness but lacks any reference to a state
of mind’ (p. 16). This implies that happiness has an emotional
component attached to it that satisfaction may not necessarily have.
Roodt (1991) lists life satisfaction as a subcomponent of psychological
well-being. Life satisfaction can be divided into two separate
components, namely overall life satisfaction, and domains-of-life
satisfaction (i.e. satisfaction in specific areas of your life such as
finances, physical health, and relationships).
Positive and negative affect
Happiness/life satisfaction on one hand, and a balance between
positive and negative affect, on the other, are considered the two
pillars on which subjective well-being rests (cf. Busseri, Sadavan &
DeCourville, 2007; Deci & Ryan, 2008; Ryff & Keyes, 1995).
Descriptions of positive affect include joyful, happy, pleased, and
enjoyment/fun, while descriptions of negative affect includes
worried/anxious, depressed, frustrated, angry/hostile, and unhappy
(cf. Diener & Emmons, 1984). Research has indicated that certain
personality traits are related to the experience of affect (for example,
the relationship between personality traits such as extraversion and
optimism on the one hand, and emotional well-being on the other
(Reis, Sheldon, Gable, Roscoe & Ryan, 2000). Research findings into
the benefits of positive emotion have indicated that it can promote
health and longevity, and can lead to more likelihood of success in
careers (Diener et al., 2008).
Social and economic determinants of subjective well-being
Social and economic factors that can be expected to impact an
individual’s perception of well-being include crime rate,
neighbourhood, income, and unemployment. In their research on the
experience of well-being over time in Europe and the US,
Blanchflower and Oswald (2002) found that women, married people,
highly educated people and those with parents whose marriages are
still intact are the happiest. Higher income was also associated with
greater happiness, and happiness levels were low among unemployed
people. These findings indicate that external factors do impact on well-
being.
11.2.3.2 Other dimensions of well-being
Carol Ryff’s model of well-being is one of the best-known within the
eudaimonic tradition. She developed this six-dimensional model as a
response to the emphasis placed on subjective experiences of well-
being in research. The six dimensions in her model are described as
follows (Ryff & Keyes, 1995):
Self-acceptance
Having a positive disposition towards oneself; acknowledging and
accepting all aspects of oneself, including good and bad aspects; and
feeling positive about the past.
Positive relations with others
Having warm, trusting relationships with other people; concern about
others’ well-being; experiencing empathy, affection, and intimacy; the
ability to give and to receive in relationships.
Autonomy
Self-determination and independence; internal locus of control; can
withstand social pressure to think and act in expected ways; can
evaluate oneself in terms of one’s own standards and not those of
others
Environmental mastery
Competent management of the environment; controlling diverse and
complex activities within the environment; can choose contexts
suitable to personal needs, and aligned with personal values.
Purpose in life
Having goals and objectives for living; has direction in life; finding
meaning in life (past and present); holds beliefs that provide purpose
in life.
Personal growth
Feels that one is continuously growing and developing; can see
improvement within oneself and in one’s behaviour as time passes;
open and ready for new encounters and experiences; experiencing
a sense of achieving and optimising one’s potential; increased self-
efficacy and self-knowledge are demonstrated by the ways in which
one changes.

In a criticism of Ryff’s model, Diener et al. (1998) commend her for


focusing on positive aspects of well-being, and accede that some of
the characteristics outlined by her (e.g. purpose, mastery, and positive
self-regard) do contribute to subjective well-being. They strongly
criticise her model, though, for underestimating the importance of
subjective well-being. Ryff developed her theoretical framework of
well-being, and its dimensions, through a comprehensive literature
review including psychological theories (e.g. the humanistic theories of
Maslow and Rogers), and philosophical understandings of ‘the good
life’ (Dierendonck, Diaz, Rodriguez-Carvajal, Blanco & Moreno-
Jiménez, 2008, p. 474). It is exactly this, how she arrived at her
multidimensional model, that forms a considerable part of Diener et
al.’s (1998) critique, as Ryff does not incorporate an individual’s own
definition of well-being in these dimensions. Ryff and Keyes (1995), on
the other hand, are of the opinion that too much emphasis has been
placed on positive affect and happiness as determinants of well-being,
at the cost of other important aspects of functioning such as meaning
in life and realising one’s potential. Diener et al. (2008) counter this
argument by emphasising research findings that subjective feelings,
such as happiness, do in fact facilitate behaviours that foster meaning
in life and the actualisation of potential, such as increased trust,
ethical behaviour and volunteering (i.e. outcomes that contribute
positively to society).
In conclusion, it can be said that a discussion of well-being is
better served by a ‘both … and’ approach than an ‘either … or’
approach. In other words, both the hedonistic and eudaimonic
traditions serve to enhance and broaden our understanding of well-
being.
You may have noticed that the foregoing sections on well-being did
not include a discussion of physical and mental illness, other than
mentioning the absence thereof as an indicator of health and well-
being. We will, however, discuss this in the following section when we
look at the impact of well-being on the business world.

❱❱ CRITICAL THINKING CHALLENGE 11.1

Think of a person that you know that, in your opinion, exemplifies well-
being in all its aspects. Based on your knowledge of and experiences with
this person, develop a model of well-being that encompasses both
hedonistic and eudaimonic understandings and perceptions thereof. Your
model should also indicate the interrelatedness of concepts of well-being.

11.2.3.3 Mental illness


Positive psychology offers an optimistic stance from which to explore
well-being. However, it is also important to view well-being from an
illness (or unwell) perspective. Of particular interest to us is mental
illness. In this regard, we particularly need to be aware of the
pervasiveness of mental illness, and its negative impact on people all
around us. Herman et al. (2009) reported that the lifetime prevalence
for psychiatric disorders is 30.3 per cent in South Africa, with an even
higher projected lifetime risk of 47.5 per cent. In this section we will
review the literature regarding mental illness with specific focus on
depression and anxiety as these are the most commonly diagnosed
disorders in South Africa (cf. Herman et al., 2009).
11.2.3.3.1 Depression
Depression is a psychiatric mood disorder. There are different
categories of mood disorders and these are described in the
Diagnostic and Statistical Manual of Mental Disorders 5 (DSM-V)
(APA, 2013). The most commonly known are major depression and
bipolar disorder. Major depression is characterised by symptoms such
as a depressed mood, hypersomnia or insomnia, weight loss or weight
gain, feelings of hopelessness and worthlessness, suicidal ideation or
suicidal intention, loss of interest in activities that used to interest the
person, fatigue and loss of concentration, and psychomotor
retardation or psychomotor agitation. Bipolar disorder consists of two
opposing mood states that alternate: a depressed mood state, which
is similar to the description of major depression, and a manic state
during which the person experiences symptoms such as feeling
invincible, talking very fast but speech is empty, no need for sleep,
and impulsivity. A manic cycle is invariably followed by a depressed
cycle. Two other categories of depression also worth mentioning are
dysthymia, which is a milder form of major depression. The sufferer is
able to maintain day-to-day functioning, but experiences a depressed
mood for more days than not. In a similar fashion, cyclothymia is a
milder form of bipolar disorder.
Among South Africans, depression has a lifetime prevalence of 9.8
per cent, and a 12-month prevalence of 4.5 per cent (Herman et al.,
2009). It is no wonder that depression has been referred to as the
‘common cold of mental illness’ (Gelman et al., 1987, p. 42).
11.2.3.3.2 Anxiety
All of us experience bouts of anxiety. For instance, you may
experience anxiety before an exam. This is normal, given the context.
In fact, if you do not experience some functional anxiety you may not
study as hard as you should. However, when anxiety becomes so
severe that it becomes debilitating (hindering you from effective daily
functioning), you may have an anxiety disorder. Anxiety is
experienced in three ways: in the cognitive domain it is expressed in,
for example, thoughts of impending doom, preoccupation with fears
and constantly feeling on edge; in the behavioural domain it is
expressed in avoidance of anxiety-inducing situations, for example
avoiding social events; in the bodily domain anxiety is expressed
through symptoms such as dry mouth, nausea, increased blood
pressure, heart palpitations, and shivers (Sue, Sue & Sue, 1997). As
with depression, there are different categories of anxiety disorders,
including posttraumatic stress disorder, panic disorder, agoraphobia,
generalised anxiety, and phobias. Herman et al. (2009) found anxiety
disorders to be the most prevalent group of lifetime disorders (15.8 per
cent) in South Africa. Also in this study, anxiety disorders showed a
12-month prevalence of 8.1 per cent.

11.3 Well-being in the workplace


It is a logical deduction that happy people are most likely also happy
employees, which will in turn benefit the companies they work for (as
measured by productivity, for example). In the following sections, we
will present several prominent aspects of the relationship between
well-being and the workplace. In corporate jargon the term ‘wellness’
appears to be preferred over ‘well-being’, although the two concepts
have essentially the same meaning. In our discussion we will thus also
use the two terms interchangeably.

11.3.1 Why well-being matters


There are two main reasons why employers should take cognisance
of their employees’ well-being. Firstly, because healthy employees are
generally better employees, which in turn impacts positively on
productivity, thus leading to improvement of the company bottom-line
(profit) (cf. Foulke & Sherman, 2005). Loeppke (2008) emphasises the
importance of making ‘strategic investments in healthcare’ (p. 106), so
as to ensure that companies can maintain a ‘workforce that is both
able and available to employers competing in a global economy’ (p.
106). Research findings indicate that employers can save money by
investing in the well-being of their employees. For example, according
to Johnson & Johnson, the company has saved $154 in healthcare
costs per participant in their wellness programme, after deducting the
cost of the programme (Mason, 1994). CitiBank (in the United States
of America) found a saving of between $4.56 to $4.73 for every dollar
spent on their employee wellness programme (Ozminkowski et al.,
2009). Baicker, Cutler, and Song (2010) conducted a meta-analysis
that included 90 organisations, and reported a medical cost saving of
$3.27 and an absenteeism cost saving of $2.73 for every dollar spent
on a wellness programme. It is also estimated that, for every £1 spent
on employing someone it costs double that amount when the
employee is absent (Cotton, 2004). Robison (2010) reports the Gallup
Inc. findings that the annual cost of lost productivity (owing to mostly
absenteeism) for an organisation is $28 800 per person with low levels
of well-being per year, whereas for persons with high levels of well-
being this cost is only $840 per year. Thus, caring about the wellness
of employees makes economic sense. Secondly, employers have a
moral and ethical responsibility to support their employees. This is, in
part, due to the fact that some illnesses, such as stress, can be
caused by working conditions and job demands. It is also part of an
employers’ social citizenship behaviours to assist in improving the
quality of life of people, and one’s own employees seems to be a
logical starting point. In this regard, Loeppke (2008) stresses the need
to invest in health as individuals, families, employers, communities,
and nations.

11.3.2 The cost of ill health


Numerous sources indicate the cost of ill health to employers.
Mercer’s absence management survey (Bass & Fleury, 2010) shows
that the direct and indirect cost of incidental absence (i.e. mostly sick
leave) amounts to 5.8 per cent of an organisation’s payroll. It is
noteworthy that direct costs amount to 2 per cent of payroll whereas
indirect costs amount to 3.8 per cent (in other words, almost double
that of direct costs) of payroll. These costs include hiring temporary
workers, overtime, business disruption, lack in product delivery, and
decreased customer satisfaction. Corrigall et al. (2007) support this by
noting that the bulk of costs in terms of mental health are owing to
absenteeism and reduced productivity rather than the actual medical
costs. The cost of absenteeism can therefore not be measured in
terms of hard money only. A study done by AIC Insurance in South
Africa estimates that absenteeism costs companies in South Africa
approximately R12 billion rand annually (Lilford, 2008). Of this
amount, between R1,8 billion and R2,2 billion is due to HIV/Aids
(Moodley, 2005). The Corporate Absenteeism Management Solutions
(CAMS) survey found the cost to be even higher, namely R19 billion
per year (Encouraging corporate wellness maintenance, 2007).
Mental illness (depression, in particular) and HIV/Aids appear to be
the most prevalent wellness-related issues that are currently
confronting companies, especially in South Africa.
HIV/Aids
Given the high prevalence of HIV/Aids in South Africa (10,5 per cent)
(STATSSA, 2010) there can exist no doubt that the epidemic will have
a negative impact on companies. Rosen, Simon, Vincent, MacLeod,
Fox and Thea (2003) list direct costs of HIV/Aids on an individual
employee level such as medical care, benefits payments, and
recruitment and training of replacement workers. They list indirect
costs as reduced productivity on the job (presenteeism), reduced
productivity owing to absenteeism, supervisor’s time to deal with
losses in productivity, and reduced productivity while the replacement
employee learns to do the job. Costs on an organisational level
include disruptions in production, lowered morale, negative effects on
labour relations, increase in on-the-job accidents, and increased
health insurance. With South Africa being a low- to middle-income
country, the economic burden may be even heavier (cf. Guariguata et
al., 2012).
The following findings underscore the impact of HIV/aids on
organisations: In their study at a South African mine, Meyer-Rath,
Pienaar, Brink, Van Zyl, Muirhead, Grant and Vickerman (2015) found
that an HIV-positive employee costs a company on average $13 271
annually over 20 years. Provision of anti-retroviral therapy reduces
these costs by 15 per cent. The findings presented by Rosen, Vincent,
MacLeod, Fox, Thea and Simon (2004) in terms of the costs of
HIV/Aids to businesses in southern Africa (South Africa and
Botswana) are indeed sobering: HIV prevalence among employees
ranged from 7.9 per cent to 25 per cent; HIV/Aids among employees
adds 0.4–5.9% to the companies’ annual salary and wage bills; and
the value of an HIV infection ranges from 0.5 to 3.6 times the annual
salary of the employee. In addition, they also found that employees
who ultimately terminate employment due to AIDS took between 11–
68 more days of paid sick leave in their last year of service;
supervisors also estimate that employees who die or go on disability
due to Aids are approximately 22–63 per cent less productive in the
last year they became ill. According to Rosen, Simon, Thea and
Vincent (2000) the direct costs of HIV to the employer include pension
and provident-fund contributions, death and funeral benefits, health
clinic use, recruitment and training. Indirect costs are typically incurred
through absenteeism and presenteeism (reduced productivity on the
job). According to AIC Business Insurance (Moodley, 2005)
companies lose as much as one month’s worth of work for every
employee with advanced HIV/Aids. The absenteeism rate for
employees with HIV/Aids is about three times higher than for those not
infected. Ellis (2007) reported that there is a significant correlation
between the impact of HIV/Aids in terms of cost to the organisation
and the implementation of HIV/Aids policies and intervention
programmes. Furthermore, Ellis also found that many of the
companies in the financial-services industry (>75%) and the mining
industry (>60%) have HIV/Aids policies and awareness programmes,
with the manufacturing, trade and building and construction industries
lagging behind (around 50 per cent and lower, in most instances). In
terms of access to anti-retroviral medication (ART) for the treatment of
HIV/Aids, the percentages of organisations offering ARTs to their
employees range from 3 per cent to 38 per cent, showing slower
uptake in this regard (Ellis, 2007). With regards to HIV/Aids care,
support, and treatment, the financial-services industry has 76 of
organisations providing this to employees. In the other industries
included in the survey, the uptake of HIV/Aids care, support, and
treatment is lagging behind and ranges from 10 to 40 per cent (Ellis,
2007). These findings demonstrate that, despite the strong evidence
provided in terms of the cost of HIV/Aids to the organisation, many
organisations are still holding back in terms of provision for HIV/Aids
employees.
Mental illness
Mental ill health costs organisations approximately £1 035 annually
per employee in the UK (Hay, 2010). Hay further indicates that
approximately two-thirds of the amount is attributable to wasted
productivity. Loeppke, Taitel, Haufle, Parry, Kessler and Jinnett (2009)
indicated that depression is the number-one most-costly illness for
organisations; anxiety is fifth on the list.
In terms of absenteeism, Mall, Lund, Vilagut, Alonso, Williams, and
Stein (2015) reported that in South Africa depression and anxiety
caused an average of 27.2 and 28.2 days out of role per year,
respectively. Interestingly, Sanderson and Andrews (2006) found that
depression and anxiety is more consistently associated with
presenteeism (being at work, but with lowered productivity) than
absenteeism. They also cite the findings of several research projects
indicating that depression is associated with more work-related loss
than most chronic medical conditions. Employees in the US lose about
an hour per week in depression-related absenteeism as opposed to
four hours lost weekly in depression-related presenteeism. In addition,
depressed employees also seem to experience more workplace
conflict.
When looking at the typical symptoms of anxiety and depression
(which often overlap), it is obvious why these disorders have such a
negative impact in the workplace, for both the concerned individual
and the organisation. Symptoms such as poor concentration and
memory problems directly affect employee performance. Depressive
and anxiety symptoms also cause behaviours such as procrastination,
increased errors and accidents, and reduced work engagement. In
severe instances, sufferers are completely unable to function. Adler,
McLaughlin and Rogers (2006) found that depressed workers had
significantly greater deficits in managing time and output tasks, even
after improvement of their symptoms. In support of their findings,
Wang, Beck, Berglund, McKenas, Pronk, Simon, and Kessler (2004)
also found major depression to be significantly associated with
decrements in productivity and task focus. They also found depression
to be more consistently related to poor work performance than any of
the other chronic conditions in their study (i.e. allergies, arthritis, back
pain, headaches, high blood pressure, and asthma). These
behaviours, in turn, influence team performance and team
relationships. In the end, organisational effectiveness and company
bottom lines are affected negatively.
The WHO (2004) reports that alcohol abuse costs South Africa
about R8,7 billion annually, due to medical costs, lost productivity,
violence and crime. Substance-use disorders have a lifetime
prevalence of 13.3 per cent with alcohol abuse having the highest
lifetime prevalence (14 per cent) (Herman et al., 2009), and these
statistics are concerning. It is interesting to note, though, that Foster
and Vaughan (2005) found that the cost of substance-abuse-related
absences to employers is negligible, and at best an incidental cost
that does not justify corporate investment in the prevention and
treatment thereof. Such a viewpoint, however, does not take the moral
responsibility of the employer into account. This is particularly true in
South Africa, where there are still many barriers to treatment (e.g.
cost, insufficient numbers of suitably trained professionals, and
insufficient treatment centres) (Pasche & Myers, 2012).

11.3.3 Well-being in the workplace


Most models of workplace well-being concentrates on negative
aspects, such as job stress, and overlook positive factors (e.g. job
satisfaction) (Schaufeli, Bakker & Van Rhenen, 2009). The Job
Demands-Resources (JD-R) model attempts to combine both positive
and negative aspects of employee well-being in one theoretical
framework (cf. Demerouti & Bakker, 2011; Schaufeli et al., 2009). As
such, the JD-R model is a leading model assisting us in obtaining a
balanced understanding of the factors that impact well-being at work.
The JD-R model operates under two main assumptions. The first
assumption is that, although all occupations have unique factors (e.g.
those that cause stress), these factors can all be grouped under two
general categories, namely job demands, and job resources (Bakker &
Demerouti, 2007). Job demands are defined as ‘those physical, social,
or organisational aspects of the job that require sustained physical or
mental effort and are therefore associated with certain physiological
and psychological costs (e.g. exhaustion)’ (Demerouti, Bakker,
Nachreiner & Schaufeli, 2001, p. 501). Examples of demands are high
work pressure, high workload and unusual working hours (Demerouti
& Bakker, 2011). Job resources ‘refer to those physical, psychological,
social and organisational aspects of the job that are either/or:
functional in achieving work goals, reduce job demands and the
associated physiological and psychological costs, and stimulate
personal growth, learning and development’ (Bakker & Demerouti,
2008, p. 312). Resources include aspects such as supervisor and co-
worker support, autonomy, and role clarity (Demerouti & Bakker,
2011).
The second main assumption of the JD-R model is that there are
‘two different underlying psychological processes’ that influence the
development of job strain and that foster motivation (Bakker &
Demerouti, 2008, p. 313). The first of these processes is the health-
impairment process: job demands drain employees’ resources (such
as time, emotional energy, and social support) and this can lead to
negative outcomes such as burnout and fatigue (Bakker & Demerouti,
2008). The second psychological process is motivational: this process
assumes that job demands can motivate employees, for example, to
reach their career goals (Demerouti & Bakker, 2011). Job resources
can also act as motivators. For example, positive feedback from a
supervisor and collegial support can enhance job satisfaction and
increase organisational commitment. Job resources also act as a
buffer against the negative effects of high job demands (cf. Demerouti
& Bakker, 2011). The basic JD-R model is illustrated in Figure 11.1
below.
Figure 11.1 The JD-R Model, adapted from Bakker & Demerouti (2007)

Bakker and Oerlemans (2010) discuss several factors that relate to


subjective well-being at work: positive factors are work engagement,
happiness at work (similar to happiness, discussed earlier in the
chapter), and job satisfaction; negative factors are workaholism and
burnout. Burnout and work engagement are the two factors that have
been shown to be of particular importance in the JD-R model’s
explanation of employee well-being (cf. Llorens, Bakker, Schaufeli &
Salanova, 2006). Well-balanced demands and resources can lead to
positive well-being outcomes. On the other hand, excessive demands
without appropriate and adequate resources can lead to negative well-
being outcomes.
Positive well-being outcomes:
Work engagement: Work engagement is defined as ‘a positive,
fulfilling, work-related state of mind that is characterised by vigor,
dedication, and absorption’ (Schaufeli, Salanova, González-Romá &
Bakker, 2002, p. 74). Vigour is identified by high energy levels, and
persistance and resilience while working; dedication is marked by a
sense of meaning, pride in one’s work, and viewing work as a
challenge; absorption refers to being immersed in work and finding it
hard to disconnect from it (Schaufeli et al., 2002).
Job satisfaction: A frequently used definition of job satisfaction is the
one proposed by Locke (1969), namely a ‘pleasurable emotional state
resulting from the appraisal of one’s job as achieving or facilitating the
achievement of one’s job values’ (p. 316). Job satisfaction can
influence aspects such as turnover intention, organisational
commitment, and work engagement.
Negative well-being outcomes:
Burnout: Arguably the most often cited definition of burnout is the one
provided by Maslach and Jackson (1981), namely that it is ‘a
syndrome of emotional exhaustion, depersonalisation and reduced
personal accomplishment that can occur among individuals who do
“people work” of some kind’ (p.1). This definition was eventually
expanded to include not only human service occupations, but all
occupations (Schaufeli, Martínez, Pinto, Salanova & Bakker, 2002). It
is characterised by three dimensions, namely emotional exhaustion,
cynicism, and reduced professional efficacy (Maslach, Schaufeli &
Leiter, 2001).
Workaholism: As cited in Schaufeli, Taris and Van Rhenen (2008, p.
175), Oates (1971) first used the term to describe ‘the compulsion or
the uncontrollable need to work incessantly’ (p. 11). This definition
highlights two elements of workaholism: firstly, that workaholics spend
disproportionate amounts of time working, and secondly, that
workaholics are driven by an obsessive, internal need to work
(Schaufeli et al., 2008). Workaholism can have undesirable outcomes
such as burnout, increased work-family conflict, and decreased life
satisfaction.

11.3.4 Wellness programmes


Interest in well-being, and wellness at work specifically, has increased
briskly (cf. Loeppke, 2008; Wicken, 2000). Thus, many companies
have introduced wellness programmes for their employees.
Sieberhagen, Pienaar, and Els (2011) found that, of 16 organisations
surveyed, 14 introduced their employee wellness programmes after
the year 2000, which demonstrates a sharp increase. In fact, Mattke et
al. (2013) consider it customary for organisations to have employer-
sponsored wellness programmes. There is considerable evidence
testifying to the effectiveness of these programmes. Numerous
companies report favourable outcomes (cf. Bloom, 2008; Mason,
1992; Mason, 1994). This is supported by Mattke et al. (2013) in their
national study, which found that wellness programmes do reduce
health risks and increase health, and that these benefits are retained
over time. These outcomes include, among others: decreased
absenteesim, increased quality of work life, increased morale and, as
shown by Mattke et al. (2013), also increased health (such as
reduction in smoking, and more physical exercise). In a review of
several independent studies, Cascio (2006) concludes that wellness
programmes do reduce health risks, that these reductions are
maintained over the long term, and that the costs of illness are
reduced. The benefits, therefore, seem to justify the expense. In a
meta-analysis of literature related to the cost-saving benefits of such
programmes, Baicker, Cutler and Song (2010) reported an average
saving of $358 annually per employee, while the cost of the
programmes averaged $144. The net saving of $214 shows that the
benefits clearly outweigh the costs of wellness programmes. With the
exception of one wellness programme, all programmes reported
reduced absenteeism. In a South African study, Naicker and Fouché
(2003) reported a return on investment of R5 to R8 for every R1 spent
on employee well-being. Cloete (2015) obtained a similar finding in his
study, reporting a R6 return on investment for every R1 spent.
Despite the positive outcomes reported, low employee participation
in wellness programmes is a cause for concern (cf. Busbin &
Campbell, 1990; Marquez, 2008). Busbin and Campbell (1990)
suggest the following explanations for low participation: (1) people
may rationalise their health problems and believe that ‘it will happen to
someone else’; (2) they may be resistant to such a change in the
employer-employee relationship; (3) they may not be willing to make
the necessary lifestyle changes; and (4) they may believe that a good
medical aid is sufficient, thus relying on treatment interventions rather
than preventative actions. In this regard, it is useful to note that the
Mercer 2011 National Survey of Employee Sponsored Health Plans
(Sinclair, 2012) shows considerably higher participation rates when
employers provide incentives. Incentives include, for example, lower
premium contribution for medical aid and even cash. Baicker et al.
(2010) reported incentives in 30 per cent of the programmes included
in their meta-analysis.
A typical wellness programme will ideally comprise activities
encompassing all the essential elements of well-being, particularly
health awareness and promotion, chronic disease management, and
preventative programmes. Furthermore, both mental and physical
health will receive attention in a wellness programme. Occupational
health and safety concerns are also addressed in these programmes.
Specific aspects include, among others, screenings for certain
cancers, high cholesterol and elevated blood sugar levels; on-site
gymnasiums; support groups for substance abusers; and chronic
disease management (for diseases such as HIV/Aids, and diabetes).
An important component of a wellness programme is the Employee
Assistance Programme (EAP). This programme offers specific support
to employees who, for example, have been traumatised by crime; who
are recently divorced or are experiencing other family-related
problems; and who suffer from stress and burnout. Companies will
typically utilise counsellors and psychologists (at no cost to the
employee) to provide this support. The service may be offered on-site
or off-site (cf. Cascio, 2006). Examples of other activities include
support with time management, disability support, and financial life-
skills training. You can clearly see from the above examples that a
wellness programme encompasses all aspects of an employee’s well-
being.
Baicker et al. (2010) found the most common interventions to be
health-risk assessments followed by self-help (e.g. reading material),
individual counselling, and on-site group activities. In terms of the
most common health issues addressed across all the programmes,
smoking and obesity were top of the list. A majority of programmes
concentrated on weight-loss and fitness. About half of programmes
concentrated on smoking, and 80 per cent addressed several health-
risk elements.

❱❱ CRITICAL THINKING CHALLENGE 11.2

Imagine you are an industrial psychologist working in the human


resources division of a South African university. Develop a programme
for a ‘Wellness Week’ to be held at your institution. Be sure to include
activities that relate to all the key components of wellness at work.

Well-being among
11.4
university students
A particular population in which well-being is very important is
university students. Studying at a tertiary institution is usually a life-
changing, positive experience. Nevertheless, it also brings challenges
of its own. Some of the challenges faced by tertiary students are the
pressures to succeed and achieve in academics, the dynamics of
adult romantic and social relationships, and financial pressures. In
addition, other life challenges are also still present, including family
dysfunction, mental illnesses such as depression, and chronic
physical illnesses. Among South African students there are also
unique factors that impact well-being, amongst others our socio-
political context (cf. Taljaard-Plaut & Strauss, 1998). The latter relates
to elements like violence, poverty, and inequality.
Psychological wellbeing occurs in, and is influenced by, multiple
life domains. These include intrapersonal, interpersonal, social, and
spiritual domains. To investigate well-being of tertiary students, it is
therefore important to view it systemically. All domains of a student’s
life can influence his or her well-being and should therefore be taken
into consideration when addressing student wellness.
Research shows support for the application of a holistic wellness
model when considering student well-being (e.g. van Lingen, Douman,
Wannenburg, 2011). In addition, recent studies applied a positive
psychology approach in terms of student well-being. In this regard,
Van Zyl and Rothmann (2012) investigated flourishing among
university students, while Morris-Paxton, Van Lingen and Elkonin
(2017a, 2017b) explored a salutogenic lifestyle programme among
disadvantaged university students. Salutogenesis refers to the belief
that it is possible to reach a state of well-being (and not illness)
(Morris-Paxton et al., 2017a). Important features of a salutogenic
approach are (1) an emphasis on assessing health rather than
diagnosing illlness; (2) considering stress and stressors as normal; (3)
emphasising that which will enhance hardiness and adaptability; and
(4) paying particular attention to factors that will enhance well-being in
its broadest sense (Morris-Paxton et al., 2017a). A salutogenic
lifestyle approach is therefore an approach that will work from a
positive perspective to enhance health and well-being in all spheres of
living.
Van Lingen et al. (2011) identified the following domains that
contribute to a holistic view of student well-being: physical, career,
intellectual, environmental, social, emotional, and spiritual wellness.
The table below is adapted from Van Lingen et al. (2011, p. 401) and
offers descriptors for each of the domains.
Table 11.1 Domains of a holistic view of student well-being

Physical wellness Exercise, nutrition, self-care, safety


behaviours, risk avoidance

Career wellness Career choice, ongoing professional


development, career competence

Intellectual wellness Intellectual challenge, gaining


knowledge, critical and innovative
thinking

Environmental wellness Environmental safeguarding

Social wellness Meaningful relationships, social skills,


giving and receiving caring, tolerance,
respect for individual differences
Emotional wellness Accepting, understanding and managing
own emotions, realistic self-appraisal and
self-esteem, stress management

Spiritual wellness Meaning and purpose in life, values,


spiritual awareness and practices

Morris-Paxton et al. (2017) utilised the above holistic wellness model


in their study. More specifically, they offered a salutogenic, holistic
wellness programme (a lifestyle management programme) to students
who were willing to participate in the study. The programme aimed to
operationalise important principles of salutogenis, namely promoting
wellness, assessing wellness, building resilience, bringing in coping
skills, and improving all-encompassing wellness as a focal point. The
results of their study showed that the wellness programme did
improve students’ well-being. Results further showed a significant
correlation between wellness and academic performance, and this
alludes to the positive spillover effect of psychological well-being. The
correlation was also stronger for students who made two to three
lifestyle changes. Lifestyle changes occurred, for example, in stress
management, fluid intake, time management, confidence, and
changes in attitude. The study demonstrates that even seemingly
small changes add up, and can have a positive impact on well-being
and subsequent academic performance.
Van Zyl and Rothmann (2012) investigated the relationship
between flourishing and academic performance among university
students. Keyes (2005) conceptualises flourishing as consisting of two
broad aspects. Firstly, hedonia is marked by traits such as a good
mood, being calm and happy, and satisfaction with life. Secondly,
positive functioning has descriptors such as: positive attitudes towards
self and others; good insight into self and others; belief in actualisation
of self and others; goal-directed behaviours; a sense of
meaningfulness; self-directed behaviours; a sense of belonging; and
warm, trusting relationships. These are the emotions and behaviours
of people with high levels of well-being. When a person lacks many of
these emotions and behaviours, and is not doing well, Keyes (2005)
describes the person as languishing. Van Zyl and Rothmann (2012)
summarised the findings of a number of studies that show that well-
being impacts academic performance. Their results showed that
students with moderate levels of flourishing had academic marks of
above average and higher, whereas a large number of students who
were languishing tended to under-perform. However, it should be
borne in mind that third variables, such as intellectual ability, were not
factored into the relationship in their study. Therefore, although
literature shows strong evidence that well-being influences academic
performance, the relationship is intricate and involves many variables.

11.5 Measures of well-being


In this section you will be introduced to measurement tools that can be
utilised to measure various constructs relating to well-being. The
section is divided into two parts. The first part contains measuring
instruments that are applied to assess aspects of well-being in diverse
contexts. The second part focuses on measurement tools that
measure aspects of well-being at work.

11.5.1 Assessment of well-being in diverse contexts


The following measuring instruments are used to assess different
aspects of well-being:
Institute for Personality and Ability Testing Anxiety Scale (IPAT)
The IPAT (Cattell & Scheier, 1961) is a 40-item questionnaire aimed
at providing an assessment of free-floating manifest anxiety (in
other words, anxiety that is not linked to specific situations). It has
been adapted for use in South Africa (Cattell, Scheier & Madge,
1968). Specifically, it has been standardised for use in some South
African populations, and norms are available for white English- and
Afrikaans-speaking adolescents and first-year university students.
Although not yet standardised for black South Africans, it has been
used successfully in multicultural samples (cf. Hatuell, 2004; Mfusi &
Mahabeer, 2000).
State-Trait Anxiety Inventory (STAI)
The STAI consists of 40 items and measures anxiety as a state and
as a trait (Spielberger, Gorsuch & Lushene, 1970). State anxiety
refers to the ‘transitory feelings of fear or worry that most of us
experience on occasion’, whereas trait anxiety refers to ‘the relatively
stable tendency of an individual to respond anxiously to a stressful
predicament’, (Gregory, 2007, p. 384). The two terms are related, as a
person’s trait anxiety is indicative of the likelihood that he or she will
experience state anxiety (Gregory, 2007). No norms for the STAI have
been established in South Africa as yet, which considerably limits its
use. In addition, Gregory (2007) mentions that results of the STAI
should be interpreted cautiously as it is highly face-valid, which makes
it easy to fake responses. The STAI has been used successfully in
research in South African samples (e.g. Burke, 2008; Moosa, 2009).
The Affectometer-2 Scale (AFM2)
The Affectometer-2 Scale (AFM2) was developed by Kammann and
Flett (1983) to measure general happiness or well-being by evaluating
the balance between positive and negative recent feelings. The 40
items of the scale are divided into two 20-item subscales. There are
10 negative and 10 positive items in each subscale. Subtotal scores
are obtained for Negative Affect (NA) and Positive Affect (PA). An
overall score is obtained by subtracting the NA score from the PA
score. Overall scores higher than 40 indicate positive subjective well-
being (i.e. happiness), while scores below 40 indicate lower levels of
subjective well-being (i.e. unhappiness). The AFM2 has been used
with success in South African studies (e.g. Bosman, Rothmann &
Buitendach, 2005; Steyn, Howcroft & Fouché, 2011), and a Setswana
version has been shown to be valid and reliable (Wissing et al., 2010).
The Positive Affect Negative Affect Scales (PANAS)
The Positive Affect Negative Affect Scales (Watson, Clark & Tellegen,
1988) are similar to the AFM2 as they also measure negative and
positive affect. According to Watson et al. (1988), positive affect refers
to the extent of feelings such as attentiveness and enthusiasm; while
negative affect is a ‘general dimension of subjective distress and
unpleasurable engagement’ (p. 1063), and includes feelings such as
anger, scorn, and fear. The scale comprises words that express
emotions. Test-takers have to indicate the extent to which they
experience these emotions during certain times (e.g. this moment,
today, the past few weeks) on a five-point Likert scale ranging from
very slightly or not at all (1) to extremely (5). The PANAS have been
used in a number of South African studies (e.g. Getz, Chamorro-
Premuzic, Roy & Devroop, 2012; Nolte, Wessels & Prinsloo, 2009;
Strümpfer, 1997), though not as extensively as the AFM2.
General Health Questionnaire (GHQ)
The GHQ (Goldberg, 1978) is a widely used screening questionnaire
aimed at observing common symptoms indicative of psychiatric
disorders in non-psychiatric contexts. (e.g. primary healthcare and
communities) (cf. Temane & Wissing, 2006). There are four versions
available, the number given to each version denotes the number of
items in it: GHQ-12, GHQ-28, GHQ-30 and GHQ-60. All four versions
utilise a four-point Likert-type response scale. The GHQ-28 has four
subscales (A−D) that measure the following aspects: Scale A
measures somatic symptoms; Scale B measures anxiety and
insomnia; Scale C measures social dysfunction; and Scale D
measures severe depression (Goldberg & Hillier, 1979). The various
versions of the GHQ are used widely in South African studies and
show acceptable psychometric properties (e.g. Malebo, Van Eeden &
Wissing, 2007; Moch, Panz, Joffe, Havlik & Moch, 2003; Mosam,
Vawda, Gordhan, Nkwanyanat & Aboobaker, 2004; Temane &
Wissing, 2006). The GHQ-28 has been translated into Afrikaans
(Ward, Lombard & Gwebushe, 2006) and Setswana (Keyes et al.,
2008).
Beck Depression Inventory (BDI)
The BDI is a 21-item self-report inventory aimed at assessing the
severity of depressive symptoms in adults and in children from 13
years of age onwards (Beck, Steer & Brown, 1996). The items in the
inventory are aligned with the diagnostic criteria for depression as
listed in the Diagnostic and Statistical Manual of Mental Disorders,
Fourth Edition (DSM-IV) – this diagnostic criteria is still the same in
the latest (fifth) edition of the DSM. The BDI has cut-off scores that are
used to diagnose depression. It is not yet standardised for use with
different populations in South Africa, and although studies were
underway (e.g. Steele & Edwards, 2002, 2008), standardisation is yet
to be completed.
Depression, Anxiety and Stress Scale (DASS)
The DASS scale (Lovibond & Lovibond, 1995) was originally a 42-item
scale that measured depression, anxiety, and stress in clinical and
non-clinical samples. Since then, a 21-item version was also
developed. In development of the scale, the frame of reference was
not a categorical (such as in found in the DSM). Rather, anxiety,
stress, and depressive symptoms were perceived as being on a
continuum. Hence, the difference between clinical samples and
general samples is not a diagnosis, or not, of depression, but rather
the extent and severity of symptoms (Lovibond & Lovibond, 1995).
Cut-off scores are available to determine the extent of depression,
from normal to extremely severe (Lovibond & Lovibond, 1995). The
DASS is scored on a four-point Likert-type scale, with response
options ranging from ‘did not apply to me at all’ to ‘applied to me very
much or most of the time’. The DASS has been validated in numerous
countries, both the English version and other language versions (cf.
Smith, Henn, & Hill, 2018, for relevant references). The DASS-21 has
been used in South African studies (e.g. Van Zyl et al., 2017;
Thuynsma & De Beer, 2017). The scale was validated for use in non-
clinical samples in South Africa, and has shown strong evidence of
construct, discriminant, and convergent validity, as well as good
reliability (Smith, Henn & Hill, 2018).
Centre for Epidemiologic Studies Depression Scale (CES-D
and CESD-R)
The CES-D (Radloff, 1977) was developed to measure depressive
symptoms in the general population. It has been validated in
psychiatric populations as well as the general population, and Radloff
(1977) reported high internal consistency reliability and confirmed
validity. The scale is a 20-item Likert-type scale. Individuals have to
indicate how often in the last week they have experienced particular
symptoms, with four response options ranging from ‘rarely/none of the
time (less than one day)’ to ‘most or all of the time (five to seven
days)’ (Radloff, 1977). It is still used in South African studies (e.g.
Kitshoff, Campbell & Naidoo, 2012; Tomita, Labys, & Burns, 2014).
Baron, Davies and Lund (2017) validated Afrikaans-, isiZulu- and
isiXhosa-language versions of a 10-item CES-D.
A revised version of the CES-D, called the CESD-R, was
developed by Eaton, Smith, Ybarra, Muntaner and Tien (2004). The
aim was to develop items that more closely reflect the DSM-IV
diagnostic criteria for major depression than the CES-D. Eaton et al.
(2004) validated the scale, and subsequently the scale was also
validated by Van Dam and Earlywine (2011). The CESD-R has been
validated for use in South Africa, and showed good reliability,
construct validity (confirming a one-factor model), as well as
discriminant and convergent validity (Michas, 2018). It has also been
used in research in South Africa (e.g. Smith, 2016). The CESD-R can
be accessed on the Internet, and includes an interpretation of scores
that can be emailed to test-takers, testers, and researchers. A score
categorisation is also offered (such as ‘possible major depressive
episode’) (Eaton, Ybarra, Schwab, 2011). You can visit the Web site
(http://cesd-r.com/cesdr/) for more information.
Generalised Anxiety Disorder scale (GAD-7)
Spitzer, Kroenke and William (2006) developed a short, seven-item
questionnaire to screen for generalised anxiety disorder in clinical
settings. They report good reliabilities and scale validity. The scale
asks respondents to indicate how often they have been bothered by
particular symptoms in the past two weeks. Respondents answer on a
four-point Likert-type response scale ranging from ‘not at all’ to ‘nearly
every day’. The scale has been utilised in South African studies in
clinical (e.g. Outhoff, 2010) and non-clinical (e.g. Smith, 2016)
settings. Validation studies in non-clinical samples are underway in
South Africa.
Sense of Coherence (SOC)
The SOC (Antonovsky, 1987) is a 29-item scale that measures sense
of coherence in the areas of comprehensibility, manageability, and
meaningfulness. The SOC utilises a seven-point Likert-type scale with
two anchoring responses of different descriptors. Examples of
questions are ‘When you talk to people, do you have the feeling that
they don’t understand you?’ (1 = never have this feeling, 7 = always
have this feeling), and ‘When you have a difficult problem, the choice
of the solution is …’ (1 = always confusing and hard to find, 7 = always
completely clear). Responses to items are summed, with higher
scores indicating a stronger sense of coherence. For more information
on the SOC, refer to Antonovsky (1979, 1987, 1996); Fourie,
Rothmann, and Van de Vijver (2008); and Strümpfer and Wissing
(1998).
The Satisfaction with Life Scale (SWLS)
The Satisfaction with Life Scale (SWLS) is a five-item measure and is
used to measure life satisfaction (Diener, Emmons, Larsen & Griffin,
1985). The SWLS is designed around the idea that one should ask
respondents about the overall judgement of their life in order to
measure the concept of life satisfaction. Scores on the SWLS range
from 5 to 35, with higher scores indicating greater life satisfaction.
Wissing, Du Toit, Wissing and Temane (2008) explored the
psychometric properties of the Setswana translation of this scale and
found it both valid and reliable. For more South African studies, see
Heyns, Venter, Esterhuyse, Bam and Odendaal (2003); Westaway
and Maluka (2005); Steyn et al. (2011); Le Roux and Kagee (2008);
and Ungerer and Strasheim (2011).
The Life Orientation Test – Revised (LOT-R)
The LOT-R, a 10-item measure, was developed by Scheier, Carver,
and Bridges (1994) to measure individual differences in generalised
optimism versus pessimism. Six items contribute to the optimism
score, and four items are fillers. The LOT-R measures a continuum of
high, average, and low optimism/pessimism. The LOT-R has been
used in South African studies and showed acceptable internal
consistencies (see Coetzer & Rothmann, 2004).
Experience of Work and Life Circumstances Questionnaire (WLQ)
This scale was developed in South Africa to measure stress levels, as
well as causes of stress, in individuals with reading and writing skills
equivalent to a Grade 10 level (Van Zyl & Van der Walt, 1991). The
WLQ consist of three scales: Scale A measures levels of stress; Scale
B measures possible sources of stress at work (including physical
working conditions, fringe benefits, and career matters), and Scale C
measures 16 possible sources of stress outside of an individual’s work
(for example, family problems, phase of life, and financial issues). This
questionnaire has been used successfully in research with South
African samples, some of which are multicultural (Nell, 2005;
Oosthuizen & Koortzen, 2007; Van Rooyen, 2000; Van Zyl, 2004).
The Locus of Control Inventory (LCI)
The LCI (Schepers, 2005) is a normative scale measuring locus of
control. The scale consists of 65 items and measures three
constructs, namely internal control, external control, and autonomy.
Internal control refers to a person’s belief that his or her behaviour is
self-determined. People with high internal control believe, among
others, that they determine the outcomes of matters and that their
achievements are the result of hard work and commitment. People
with high external control believe that aspects such as random events
and fate determine their behaviour. Such people believe, for example,
in fatalism, and that their life is to a large extent influenced by
coincidence. Autonomy refers to the fact that individuals ‘seek control
of situations that offer possibilities of change, take initiative in
situations requiring leadership, prefer to work on their own, and
choose to structure their own work programme’ (Schepers, 2005, p.
25).
Psychometric properties of the scale are very good and Afrikaans,
English and Zulu versions are available. The available norms are
suitable for all four population groups, namely black, white, coloured,
and Indian South Africans. It is also suitable for use among students
and adults.
McMaster Health Index Questionnaire (MHIQ)
This questionnaire is a result of an attempt to develop a measure that
assesses quality of life of medical patients that conforms to the World
Health Organisation’s definition of health (cf. Chambers, 1993). The
MHIQ includes physical, emotional, and social aspects in its
assessment of an individual’s quality of life. The measure has not yet
been standardised for use with South African populations.
Wellness Questionnaire for Higher Education
The construct of wellness and the measurement thereof proved to be
useful in higher education institutions in the US to understand student
needs, and guide development and intervention programmes. As
there was no comparable measure in South Africa, the Wellness
Questionnaire for Higher Education (WQHE) has been developed
(Gradidge & De Jager, 2011). It assesses physical, intellectual, social,
environmental, occupational, financial, emotional, and spiritual
dimensions of wellness. Results from the WQHE can be used when
engaging a student in individual counselling or in group contexts when
fostering holistic student development, and also to guide student
engagement in learning.
Ryff’s Psychological Well-being Scales (PWB)
Carol Ryff (1989) developed scales to measure the six dimensions of
well-being that she identified from theory (refer to earlier in the chapter
where the six dimensions are explained). The original version contains
84 items, and is scored on a six-point Likert-type rating scale with
responses ranging from ‘strongly disagree’ to ‘strongly agree’. Ryff
(1989) found initial evidence of construct validity, and also reported
good reliabilities. Several studies examined the factor structure of the
Ryff PWB scales, but results about factor structure remain
inconclusive with items in the varying studies not loading on the
dimensions similar to Ryff’s original scale (e.g. Henn, Hill &
Jorgensen, 2016; Kafka & Kozma, 2002; Triado, Villar, Solé &
Celdrán, 2007; Van Dierendonck, 2004). Nevertheless, the scales
have been used in many South African studies (to name but a few:
Edwards & Edwards, 2009; Nolte, Steyn, Kruger & Fletcher, 2016;
Mboweni, 2007; Wigget, 2006; and Jones, Hill & Henn, 2015). The
scales provide valuable information about psychological well-being,
and given the widespread use of the Ryff PWB scales, more South
African validation studies are crucial.

11.5.2 Assessment of well-being in the work context


The following measurement tools can be utilised to assess various
aspects of well-being that relate to the world of work.
Sources of Work Stress Inventory (SWSI)
The SWSI (De Bruin & Taylor, 2005) is a South African developed
questionnaire aimed at measuring occupational stress as well as
identifying possible sources of work stress. The first section (General
Work Stress Scale) assesses the extent of work itself as a source of
stress. The second section (Sources of Job Stress Scale) consists of
nine sources of job stress. These sources are (De Bruin & Taylor,
2005):
Role ambiguity: stress experienced due to, for example, aspects such
as constant revamping and adjusting of expectations and unclear job
descriptions.
Relationships: stress experienced due to, for example, bad
relationships with colleagues and managers.
Workload: stress experienced due to, for example, a perceived
inability to cope with work demands.
Autonomy: stress experienced due to insufficient empowerment and
lack of freedom to make decisions.
Bureaucracy: stress experienced due to rigid and strictly controlled
rules, procedures, and protocols.
Tools and equipment: stress experienced due to faulty or broken
machinery, intricate machinery, and inadequate or inappropriate tools
to do the job.
Physical environment: stress caused by environmental factors such as
extreme temperatures, high noise levels and poor lighting.
Career advancement/job security: stress caused by a perceived lack
of options/chances to further one’s career (career advancement) or
uncertainty in terms of one’s future at one’s place of work (job
security).
Work/home interface: stress experienced due to aspects such as
insufficient social support outside the workplace, and work-family
conflict.
Maslach Burnout Inventory (MBI)
The MBI (Maslach & Jackson, 1981) measures the burnout of
individuals. Three versions of the MBI exist, namely the MBI-GS
(General Survey), the MBI-ES (Educators), and the MBI-HSS (Human
Services Survey) (Maslach, Jackson, Leiter, Schaufeli & Schwab, n.d.)
The General Survey measures burnout in a broad range of
professions, whereas the ES and HSS versions of the MBI measure
burnout in the human services and education contexts respectively
(Naude & Rothmann, 2004). The MBI-GS consists of 16 items and
measures burnout in terms of Exhaustion, Cynicism, and Professional
efficacy. The MBI-ES and MBI-HSS consists of 22 items and
measures Emotional exhaustion, Depersonalisation, and Personal
accomplishment. Emotional exhaustion refers to feeling tired, fatigued,
and ‘no longer able to give of themselves at a psychological level’
(Maslach & Jackson, 1981, p. 99) by work. Depersonalisation refers to
feelings of detachment and aloofness towards people one relates to at
work (e.g. clients), and Personal accomplishment refers to feelings of
doing one’s work competently (Maslach & Jackson, 1981). Together
these subscales provide a three-dimensional perspective on burnout.
High scores on Exhaustion/Emotional exhaustion and
Cynicism/Depersonalisation and low scores on Professional
efficacy/Personal accomplishment are indicative of burnout.
Responses are captured in a seven-point Likert-type scale (Maslach,
Jackson & Leiter, 1996) ranging from ‘never’ to ‘every day’. Although
the MBI has been utilised in several South African studies, it has not
been standardised yet. It is thus mostly utilised for research purposes.
For more information on the MBI, see Maslach, Jackson, and Leiter
(1996); and Worley, Vassar, Wheeler and Barnes (2008).
South African Burnout Scale
Asiwe, Jorgensen and Hill (2014) developed a South African burnout
scale. The scale measures three dimensions of burnout, namely
cognitive weariness, fatigue and emotional exhaustion/withdrawal.
The scale consists of 17 items and is scored on a seven-point scale
with response options ranging from ‘always’ to ‘never’. Asiwe et al.
(2014) found good scale reliability as well as construct equivalence for
black and white participants and for African and Afrikaans language
groups. The scale has been developed in an agricultural sample and
more validation studies are required. Yet, it shows promise and has
been used in another South African study (Birkenstock, 2017) in which
it performed well.
The Utrecht Work Engagement Scale (UWES)
The UWES measures the levels of work engagement of university
students and adults (Schaufeli & Bakker, 2003). Three dimensions of
work engagement can be distinguished, namely Vigour, Dedication
and Absorption. Vigour is characterised by high energy levels and
mental resilience while working, the willingness to invest effort in one’s
work, and persistence even in the face of difficulties. Dedication is
characterised by a sense of significance, enthusiasm, inspiration,
pride, and challenge. Absorption is characterised by being totally and
happily immersed in one’s work, to the extent that it is difficult to
detach oneself from it. Engaged individuals are characterised by high
levels of Vigour and Dedication and also elevated levels of Absorption.
Three versions of the UWES exist, depending on the number of items
to be used in research. This includes the UWES-17, UWES-15 and
UWES-9. The UWES has been used in several South African studies.
For more information on the UWES, visit this Web site:
http://www.schaufeli.com. You can also access the questionnaire and
its manual on this Web site.
Minnesota Satisfaction Questionnaire (MSQ)
The MSQ (Weiss, Dawis, Engand & Lofquist, 1967) is used in the
assessment of job satisfaction. It taps affective responses to various
aspects of one’s job. A person can be relatively satisfied with one
aspect of his or her job while being dissatisfied with one or more of the
other aspects. The revised MSQ form measures intrinsic job
satisfaction and extrinsic job satisfaction, using questions like: ‘The
chance to be “somebody” in the community’ (intrinsic), ‘The chance to
do something that makes use of my abilities’ (intrinsic), ‘My pay and
the amount of work I do’ (extrinsic), and ‘The working conditions’
(extrinsic) (Hirschfeld, 2000, p. 258). The item content is of such a
nature that it can be applied to a variety of organisations and
occupational positions.
Work Locus of Control Scale (WLCS)
The WLCS consists of 16 items and was developed by Spector (1988)
to measure the work locus of control of individuals. The measure is
composed of eight items designed to tap internal control and eight
items tapping external control. A sample internal control item is ‘A job
is what you make of it.’ A sample external control item is ‘Getting the
job you want is mostly a matter of luck.’ High scores are indicative of
an external locus of control. The scale has also been utilised in South
African studies and showed acceptable internal consistency (see
Botha & Pienaar, 2006).

11.6 Conclusion
The concept of ‘well-being’ is central to the discipline and profession
of psychology. Well-being and its subdimensions were considered
from a theoretical perspective as well as in terms of the range of
available measures used to assess them in this chapter.

CHECK YOUR PROGRESS 11.1


11.1 Differentiate between, and describe, eudaimonic and
hedonistic conceptions of well-being.
11.2 Explain the importance of well-being from the perspective of
the employer, as well as from the employee’s perspective.
11.3 Name and describe measurement tools that can be utilised to
assess the various constructs related to well-being
❱❱ ETHICAL DILEMMA CASE STUDY 11.1

You are an industrial psychologist working at a financial-services provider. As


part of your job description you have designed a comprehensive wellness
programme. In an initial health screening you have included the Beck Depression
Inventory as part of a mental health assessment battery. One of the participants,
Hannah Hudson, scored very high on this inventory and this, coupled with other
assessment findings, indicated that she exhibits many of the signs of depression.
As a result you have referred her to a counselling psychologist as well as a
psychiatrist for further assessment and possible treatment. Hannah reported to
you that she has been in therapy with the psychologist, that she is on medication
for major depression, and that her mental health is improving. About six weeks
later you are requested by an executive to administer an assessment battery with
the purpose of ranking three shortlisted applicants for a senior management
position in the organisation. One of the candidates is Hannah Hudson.

Questions
1. To what extent should the information you have about Hannah’s psychiatric
condition affect your final report and the ranking of the candidates? Indicate all
the aspects that would influence you in this regard, including ethical aspects.
2. Should you disclose your knowledge about Hannah’s psychiatric
condition (a) in the written report to the executive and/or (b)
verbally to the executive concerned? Frame your answer in
terms of good, ethical assessment practice guidelines.

ADDITIONAL READING

Sheldon, K.M., Kashdan, T.B., & Steger, M.F. 2011. Designing positive psychology:
Taking stock and moving forward. Oxford: Oxford University Press.
Cooper, C.L. 2011. Organisational health and wellbeing. Los Angeles: Sage.

Bakker, A.B. & Leiter, M.P. 2010. Work engagement: A handbook of essential theory
and research. Hove: Psychology Press.
Seligman, M.E.P. 2003. Authentic happiness: Using the new positive psychology to
realise your potential for lasting fulfilment. London: Nicholas Brealey.
Csikszentmihalyi, M. & Csikszentmihalyi, I.S. 2006. A life worth living: Contributions to
positive psychology. New York: Oxford University Press.
Feldman, F. 2010. What is this thing called happiness? New York:
Oxford University Press.
Diener, Ed. 2009. Assessing well-being: The collected works of Ed Diener. New York:
Springer.
Leiter, M.P. & Maslach, C. 2005. Banishing burnout: Six strategies for improving your
relationship with work. San Francisco, California: Jossey-Bass.

Did you find this an interesting stop in the Types of Measures Zone?
The next stop should prove to be just as stimulating, as you will be
given an opportunity to explore how personality functioning can be
assessed. Assessing what makes people tick (personality make-up) is
fundamental to the discipline and profession of psychology, so it is
something that most psychology students are keen to learn more
about.
Personality assessment
NICOLA TAYLOR AND GIDEON P. DE BRUIN
Chapter CHAPTER OUTCOMES
12 By the end of this chapter you should be able to:
❱ have a conceptual understanding of personality and personality assessment
❱ distinguish between structured and projective personality-assessment
methods
❱ critically describe the main assumptions underlying the use of the structured
assessment methods and discuss the challenges associated with the cross-
cultural use of such methods
❱ critically describe the main assumptions underlying the use of projective
tests such as the Thematic Apperception Test (TAT) and discuss the
problems associated with the cross-cultural use of the TAT.

12.1 Introduction
Psychology professionals in industrial, psychodiagnostic, forensic,
educational, and other contexts are often requested to assess an
individual’s personality (see Chapter 15). Usually the purpose of such
an assessment is to understand the uniqueness of the individual.
Psychologists also use personality assessment to identify the
individual’s characteristic strengths and weaknesses, and his or her
typical way of interacting with the world and the self. There are many
different methods and procedures that psychologists may use for
personality assessment. The choice of method usually depends on (a)
the reason for the assessment, (b) the psychologist’s theoretical
orientation, and (c) the psychologist’s preference for particular
methods and procedures. In this chapter you will be introduced to how
to conceptualise personality and its subdimensions, then different
methods used to assess personality, and specific assessment
measures of personality functioning that psychologists typically use in
practice.

A conceptual scheme for


12.2
personality assessment
The American personality psychologist Dan McAdams argued that
personality can be described and understood on at least three levels
(McAdams, 1995). Briefly, the first level focuses on personality traits
or basic behavioural and emotional tendencies, and refers to the
stable characteristics that the person has. The second level focuses
on personal projects and concerns, and refers to what the person is
doing and what the person wants to achieve. This level is, to a large
degree, concerned with an individual’s motives. On the third level, the
focus falls on the person’s life story or narrative, and refers to how the
person constructs an integrated identity. These three levels are not
necessarily related to each other and do not neatly fold into each
other. As the different levels are also not hierarchically arranged, all
the levels have equal importance (McAdams, 1994). For a full
understanding of a person it is probably necessary to focus on all
three levels as set out in McAdams’ scheme.
This scheme provides a comprehensive framework for personality
assessment. Depending on which level of description the psychologist
wishes to focus, different assessment techniques or measures may be
selected. In this chapter we focus on the first two levels of McAdams’
scheme. It is only on these two levels that the assessment measures
and techniques that psychologists typically use for personality
assessment are relevant. The third level focuses on the individual’s
identity and life story. Assessment on this level refers to the way in
which the individual constructs his or her identity and falls outside the
domain of what is commonly thought of as personality assessment.
Structured methods of personality assessment usually refer to
standardised questionnaires, inventories, or interviews that contain a
fixed set of statements or questions. Test-takers often indicate their
responses to these statements or questions by choosing an answer
from a set of given answers, or by indicating the extent to which the
statements or questions are relevant to them. Structured methods are
also characterised by fixed scoring rules. In this chapter structured
methods are used to illustrate assessment on the first level in
McAdams’ scheme.
Projective assessment techniques are characterised by
ambiguous stimuli, and individuals are encouraged to provide any
response to the stimuli that comes to mind. It is assumed that these
responses reflect something of the individual’s personality. Projective
techniques are often scored and interpreted in a more intuitive way
than structured personality questionnaires and inventories, and are
used to assess the second level of personality as set out by
McAdams.
One potentially major advantage of projective techniques is that
the person who is being assessed may find it very difficult to provide
socially desirable responses. This is often a problem in structured
personality measures, because it may be obvious from the content of
the items or the questions what the psychologist is attempting to
assess and what the socially desirable or appropriate answer would
be. For example, when being assessed with regards to your suitability
for a job, you are asked to respond either ‘Yes’ or ‘No’ to the following
item on a personality measure: ‘I tell lies sometimes’. You will feel
fairly pressurised to answer ‘No’, as you will reason that your potential
employers are not looking to employ someone who lies. If you
succumb to the pressure and answer ‘No’, you would have responded
in a socially desirable way. Furthermore, if you continue to respond in
this way on the measure, a pattern of socially desirable answers will
be discernible (this is often referred to as a response set). Another
response set that can be problematic in objective personality
measures is that of acquiescence. This is the tendency to agree rather
than to disagree when you doubt how best you should respond to a
question. See Sections 4.2.3.1 and 6.2.1.4 in Chapters 4 and 6
respectively for further discussion on response sets and bias.
Due to limited space, it is not possible to discuss all assessment
procedures and instruments that are available in South Africa. Rather
than providing a superficial overview of all available measures, we will
focus on measures and techniques that (a) are commonly used both
in South Africa and internationally, (b) have potential for use with all
cultural groups, and (c) illustrate important methods of personality
assessment.

❱❱ CRITICAL THINKING CHALLENGE 12.1


Describe what is meant by a socially desirable response set. Then write five
true/false items that you think will provide a good measure of this response set.

12.3Level 1: The assessment of


relatively stable personality traits
This level of understanding and describing personality focuses on
basic tendencies, attributes, or traits. Some personality psychologists
prefer to regard such basic tendencies or traits as predispositions to
behave in particular ways. From this perspective, someone with a trait
of, for instance, aggressiveness will be predisposed to behave in an
aggressive way. Likewise, somebody with a trait of extraversion will be
predisposed to seek out environments where, for example, a great
amount of social interaction is to be found.
Trait psychologists focus on internal factors that influence the way
in which people behave. It is assumed that these internal factors
influence behaviour across a variety of situations and over time. From
the above it can be seen that trait psychologists are interested in
describing consistencies in people’s behaviour. Trait psychologists are
also interested in stable differences between people. Trait researchers
ask questions such as ‘What are the basic dimensions of personality
in terms of which people differ from each other?’ and ‘Why is it that
people show relatively stable differences in terms of traits such as
aggressiveness and extraversion?’ Trait psychologists are also
interested in the real-life implications of such differences between
people. Accordingly, they attempt to demonstrate that traits can be
important predictors of criteria such as psychopathology, leadership,
marital and job satisfaction, career choice, and academic and work
success.
One of the problems with which trait psychologists are confronted
is the identification of a fundamental set of attributes or traits that may
be used to provide an economical but comprehensive description of
personality. Any casual perusal of a dictionary may reveal that there
are thousands of words to describe personality. It is also possible to
think of thousands of different sentences in which personality may be
described. However, many of these words and sentences show a
substantial amount of overlap and many refer to similar behaviours.
This overlap suggests that it may be possible to reduce the large
number of words and sentences that describe personality to a smaller
number of dimensions that contain most of the information included in
the original words and sentences. Similarly, it is possible that many of
the traits or attributes that might be used to describe people may
overlap and hence be redundant. In such a case it may be possible to
reduce the number of traits to a smaller number of more fundamental
traits without much loss in information.
Trait psychologists rely heavily on a technique called factor
analysis to identify basic dimensions that underlie sets of related
variables (such as overlapping words or sentences that may be used
to describe personality). Factor analysis is a powerful statistical
technique that can be used to reduce large groups of variables to a
smaller and more manageable number of dimensions or factors. In
this sense, factor analysis can be used as a summarising technique.

As an example, consider the following short list of words: talkative, social,


bold, assertive, organised, responsible, thorough, and hardworking.
Each of these words may be used to describe personality.
Based on our observations we could say, for example, ‘Mr X is very
talkative and social, but he is not very organised and responsible’.
We could also rate Mr X in terms of the eight words in our list on, for instance, a
three-point scale. On this hypothetical three-point scale, a rating of three indicates
that the word is a very accurate descriptor, two indicates that the word is a
reasonably accurate descriptor, and one indicates that the word is an inaccurate
descriptor. By rating many different people in terms of these words, one may begin
to observe certain patterns in the ratings. In our example it is likely that people who
obtain high ratings for organised will also obtain high ratings for responsible,
thorough, and hardworking. It is also likely that people who obtain high ratings for
assertive will also obtain high ratings for talkative, bold, and social.

It therefore seems as if the original eight variables cluster into


two groups and
that the original variables may be summarised in terms of two basic
or fundamental dimensions. These could perhaps be described as
Extraversion (Outgoing) and Conscientiousness (Methodical).

In our example we used only eight variables and even a casual glance
at the words suggested they could be grouped according to two
dimensions. However, when the aim is to identify a comprehensive set
of basic personality dimensions, a much larger number of variables
that are representative of the total personality domain are needed. In
such a study it will not be possible to identify basic personality
dimensions by visual inspection of the variables alone. Factor analysis
helps researchers to identify statistically the basic dimensions that are
necessary to summarise the original variables. Factor analysis also
helps us understand the nature of the dimensions that underlie the
original variables. This is achieved by inspecting the nature or
meaning of the variables that are grouped together in the dimension or
factor. Although the aims of factor analysis can be relatively easily
understood on a conceptual level, the mathematics underlying the
procedure is quite complex. Fortunately, computers can perform all
the calculations, although researchers should have a thorough
knowledge of the mathematical model in order to use factor analysis
properly. Interested readers are referred to Gorsuch (1983) and Kline
(1994).

❱❱ CRITICAL THINKING CHALLENGE 12.2

Consult a dictionary and write down 50 adjectives that refer to traits or


characteristics of personality. Try to choose a mixture of positive and negative
terms. Be careful not to include adjectives that are synonyms (similar in meaning)
or antonyms (opposite in meaning). Put the list in alphabetical order and then tick
each adjective that you think would describe characteristics of your personality.
Summarise the information by trying to group together adjectives (which you
have ticked) that seem to have something in common and then try to find one
overarching name for each grouping. How accurately do you think that these
groupings describe your personality? What has been left out?

12.3.1 The 16 Personality Factor Questionnaire (16PF)


One of the most prominent trait psychologists of the twentieth century
was Raymond B. Cattell (see Chapter 2). He made extensive use of
factor analysis and identified a list of about 20 primary personality
traits. These traits are also often called factors. Cattell selected 16 of
these traits to be included in a personality questionnaire for adults,
which he called the 16 Personality Factor Questionnaire (16PF). The
16PF is one of the most widely used tests of normal personality in the
world (Piotrowski & Keller, 1989). The traits that are measured by the
16PF are listed in Table 12.1. You should note that all the traits are
bipolar; that is, at the one pole there is a low amount of the trait and at
the other pole there is a high amount of the trait.
Table 12.1 Personality factors measured by the 16PF

FACTOR LOW SCORE HIGH SCORE

Primary Factors

A: Warmth Reserved Warm

B: Reasoning Concrete Abstract

C: Emotional Stability Reactive Stable

E: Dominance Cooperative Dominant

F: Liveliness Restrained Lively

G: Rule-consciousness Expedient Rule-conscious

H: Social Boldness Shy Socially bold

I: Sensitivity Utilitarian Sensitive


L: Vigilance Trusting Suspicious

M: Abstractedness Practical Imaginative

N: Privateness Forthright Private

O: Apprehension Self-assured Apprehensive

Q1: Openness to Traditional Open to change


change

Q2: Self-Reliance Group-orientated Self-reliant

Q3: Perfectionism Unexacting Perfectionistic

Q4: Tension Relaxed Tense

Second-order Factors

EX: Extraversion Introverted Extraverted

AX: Anxiety Low anxiety High anxiety

TM: Tough- Receptive Tough-minded


mindedness

IN: Independence Accommodating Independent

SC: Self-control Unrestrained Self-controlled

Note: The factor labels and descriptors were taken from the
16PF5 (Conn & Rieke, 1998).

The 16 primary personality traits or factors are not completely


independent of each other. By factor analysing the patterns of
relationships between the 16 primary traits, personality researchers
have identified at least five well-established second-order factors that
underlie the 16 primary factors (Aluja & Blanch, 2004; Krug & Johns,
1986; Van Eeden & Prinsloo, 1997). A sixth second-order factor is
also sometimes identified, but this factor is usually defined only by
Scale B, which measures reasoning rather than a personality trait.
These factors are called second-order factors because they are
derived from the relationships between primary or first-order factors.
The second-order factors are more general and relate to a broader
spectrum of behaviours than the primary factors. However, what is
gained in generality is lost in predictive power. Mershon and Gorsuch
(1988), and Paunonen and Ashton (1988) showed that primary factors
or lower-level personality traits allow for better prediction of future
behaviours than do second-order factors or higher-level personality
traits.
It is important to note that the development of the 16PF was not
guided by any particular psychological theory. Items and scales were
not selected because they correlated with important external criteria
such as psychopathology or leadership. Rather, the scales were
chosen and refined because they were identified through factor
analysis as representing important and meaningful clusters of
behaviours. This strategy of test development is called the factor-
analytic strategy.
The most recent version of the 16PF to be standardised in South
Africa is the 16PF – Fifth Edition (16PF5), which has replaced the
earlier Forms A, B, and SA92. The 16PF5 is used with adults who
have completed at least Grade 12 or its equivalent. Earlier forms of
the 16PF were often criticised for the low reliabilities of the 16 primary-
factor scales. In response to these criticisms, the reliabilities and
general psychometric properties of the 16PF5 are much improved
(Aluja & Blanch, 2004; Institute for Personality and Ability Testing,
2006; Schuerger, 2002).
Although the 16PF has been used extensively with all population
groups in South Africa, the evidence in support of the cross-cultural
use of the earlier versions of the questionnaire is not very strong.
Abrahams and Mauer (1999a, 1999b) reported that many of the 16PF
SA92 items appear to be biased. Their qualitative analysis
demonstrated that the meaning of many of the English words included
in the 16PF items is not clear to individuals who do not use English as
their first language. Van Eeden and Prinsloo (1997) also concluded in
their study into the cross-cultural use of the 16PF that the ‘constructs
measured by the 16PF cannot be generalised unconditionally to the
different subgroups’ (p. 158) in their study. The subgroups to which
they referred were (a) individuals using English and Afrikaans as their
first language, and (b) individuals whose first language is an African
language.
The 16PF5 (16PF5; Cattell, Cattell & Cattell, 1993) has also been
adapted into South African English and Afrikaans versions (Institute
for Personality and Ability Testing, 2006). The factor structure of the
16PF5 appeared to replicate well in the South African version,
indicating alignment with the international versions. Schepers and
Hassett (2006) investigated the relationship between the 16PF5 and
the Locus of Control Inventory, and found three significant canonical
correlations between the two assessments that were interpreted as
Ascendancy, Emotional stability, and Rule-consciousness. Six global
factors were found for the 16PF5 that were identified as Liveliness,
Perfectionism, Dominance, Tension, Abstractedness, and Warmth. De
Bruin, Schepers, and Taylor (2005) reported that the second-order
factor structure of the 16PF5 is very similar for Afrikaans, English,
Nguni, and SeSotho groups, suggesting that it measures the same
broad personality traits across the different languages. Van Zyl and
Becker (2012) investigated the equivalence of the 16PF5 across
cultural groups and found good evidence for the cross-cultural
applicability of the 16PF5 in South Africa. This holds promise for the
continued use of the 16PF5 in South Africa.
It should be noted that the Institute for Personality and Ability
Testing in the United States also developed related personality
measures for children and adolescents that tap 14 primary
personality traits. The Children’s Personality Questionnaire (CPQ) was
mainly used in educational and clinical settings with children from 8 to
12 years of age, while the High School Personality Questionnaire
(HSPQ) was applied to adolescents from 12 to 18 years of age in
clinical, educational, and career-counselling contexts. These
assessments have both been discontinued. The Adolescent
Personality Questionnaire (APQ) replaced the HSPQ as a measure of
Cattell’s 16 factors for adolescents. There has been no research done
using the APQ in South Africa.
The 15 Factor Questionnaire (15FQ) was developed as an
alternative to the 16PF as it measures 15 of the 16PF factors while
excluding one (Factor B, which taps reasoning or intelligence). The 15
Factor Questionnaire Plus (15FQ+) (Psytech International, 2000) is
the updated version of the original 15FQ. The 15FQ+ assesses
normal personality functioning and is widely used for personnel
selection in South Africa (Moyo & Theron, 2011). While there is
research evidence regarding the construct validity of the 15FQ+ in the
South African context, there is concern about the low reliabilities found
for some of the subscales for African language groups (Meiring, Van
de Vijver, & Rothmann, 2006). Moyo and Theron (2011) conclude that
while the reliability and validity of the 15FQ+ are moderate, their
research revealed that it is ‘a noisy measure of personality amongst
black South African managers’ (p. 19). This suggests that there is a lot
of error in the measurement of personality using the 15FQ+ with this
group. They call for further research with other groups and suggest
that in the meantime the results should be interpreted with caution
when applying the 15FQ+ to black South Africans.

12.3.2 The Big Five model of personality traits


During the last two decades many personality researchers have come
to the conclusion that the domain of personality traits may be
accurately described and summarised in terms of five broad traits.
These traits are often labelled as Extraversion, Neuroticism or
Emotional stability, Agreeableness, Conscientiousness, and
Openness to Experience or Intellect or Culture. This conclusion is
based on the observation that the factor analysis of different
personality scales and personality descriptive adjectives mostly result
in a five-factor solution that corresponds with the factors just listed.
These factors, which are jointly referred to as the Big Five model of
personality, have also been identified in many different cultures.
However, attempts to identify the Big Five structure in South Africa
have yielded mixed results. Using the internationally developed NEO
PI-R, De Bruin (2000) found support for the Big Five factors among
white Afrikaans-speaking participants, whereas Heaven, Connors, and
Stones (1994), Heaven and Pretorius (1998), and Taylor (2000) failed
to find support for the factors among black South African participants.
However, Taylor and De Bruin (2003) reported support for the Big Five
factors among the responses of Afrikaans, English, Nguni, and
SeSotho speakers to the Basic Traits Inventory, which is a South
African measure of the Big Five model. This finding shows that
measures of personality that take local conditions into account may
yield more satisfactory results than imported measures that fail to take
local conditions into account. As a response to this, an ambitious
project is well underway to develop the South African Personality
Inventory (SAPI). Aspects of the indigenous measure’s development
are covered in 12.3.9 and also in Chapters 6 and 7 (see Sections
6.2.1.3 and 7.7).

12.3.3 The Basic Traits Inventory


The development of the Basic Traits Inventory (BTI) started in 2002.
At that stage there were no cross-culturally appropriate, locally
developed personality inventories available in South Africa. Taylor and
De Bruin (2006) decided to develop a new measure of personality for
South Africa, using the Big Five factors that have been shown to have
cross-cultural applicability worldwide (McCrae, Costa, Martin, Oryol,
Rukavishnikov, Senin, Hrebícková & Urbánek, 2004). The intention
was to ensure the construct validity of the BTI from the outset, which
demanded the specification of precise definitions of the five factors.
Each of the five factors is divided into five facets, except for the
Neuroticism factor, which has only four facets. The definitions for the
factors and the facets associated with each are presented in Table
12.2. The items in the BTI were then written in the form of statements
with a five-point Likert-type scale. The items are grouped according to
their respective facets, and these are presented together for each
factor, instead of in random order. Contrary to convention, the items
are all keyed positively in the direction of their dimension.
The five factors of the BTI have consistently shown good internal
consistency reliability, and the factor structure has been replicated in a
number of different contexts and across gender, ethnic, and language
groups (De Bruin et al., 2005; Taylor, 2004; Taylor, 2008; Taylor & De
Bruin, 2006). Ramsay, Taylor, De Bruin, and Meiring (2008) showed
that the BTI functioned equally well across Nguni, SeSotho, and
Sepedi language groups, providing further evidence for the cross-
cultural transportability of the Big Five model of personality, as well as
the use of the BTI in different language groups. The BTI has been
used in South African research investigating the relationship between
the Big Five and emotional intelligence (Venter, 2006), academic
cheating (De Bruin & Rudnick, 2007), life balance (Thomson, 2007),
coping (Govender, 2008), individualism/collectivism (Vogt & Laher,
2009), and teamwork (Desai, 2010). A short form of the BTI is
available for research purposes, as it only provides scores on the Big
Five factors. The BTI is available in paper-and-pencil format and
online, and both formats are available in English and Afrikaans. The
BTI also has norms available for adolescents, as it was determined
that the items work equally well for younger populations (Taylor & De
Bruin, 2017). Table 12.2 summarises the personality factors measured
by the BTI.
Table 12.2 Personality factors measured by the Basic Traits Inventory

FACTOR DESCRIPTION FACETS

Extraversion (E) Enjoys being around Gregariousness,


other people, likes Positive Affectivity,
excitement, and is Ascendance,
cheerful in disposition Excitement-seeking,
Liveliness

Neuroticism (N) Experiences negative Anxiety, Depression,


emotions in response Self-consciousness,
to their environment Affective instability

Conscientiousness Is effective and efficient Order, Self-discipline,


(C) in how they plan, Dutifulness, Effort,
organise, and execute Prudence
tasks

Openness to Is willing to experience Aesthetics, Actions,


Experience (O) new or different things Values, Ideas,
and is curious Imagination

Agreeableness (A) Is able to get along Straightforwardness,


with other people and Compliance, Modesty,
has compassion for Tender-mindedness,
others Prosocial tendencies

12.3.4 The Hogan Personality Inventory


The Hogan Personality Inventory (HPI) was developed in the early
1980s as a measure of personality for the workplace. It uses the five-
factor model as a taxonomy, but has adjusted the model slightly to
measure seven differentiated elements of personality. These are
Adjustment, Ambition, Sociability, Interpersonal sensitivity, Prudence,
Inquisitive, and Learning approach. The HPI is grounded in
socioanalytic theory, so the feedback provided to the individual is in
terms of their reputation in the workplace, rather than the way they
perceive themselves (Hogan Assessment Systems, 2012).
The HPI enjoys a long history of consistent research, where the
scales are retained and improved over time in accordance with their
ability to predict performance in the workplace. The South African
manual reports reliability coefficients between .67 and .84 for the
scales of the HPI. There were slight mean differences between ethnic
group scores on the HPI scales, but little evidence of bias, and the
differences were not large enough to warrant the development of
separate norm scores for ethnic groups (Hogan Assessment Systems,
2012).
12.3.5 The Occupational Personality Questionnaire
The Occupational Personality Questionnaire (OPQ) is a popular
personality assessment tool in the workplace. The OPQ exists in a
variety of forms, of which the most recent is the OPQ32. The OPQ32
measures 32 work-related personality characteristics, and is used in
industry for a wide variety of purposes, namely personnel selection,
training and development, performance management, team building,
organisational development, and counselling.
The OPQ is not based on any particular personality theory. The
authors adopted an eclectic approach and included personality traits
studied by other psychologists such as Cattell, Eysenck, and Murray.
In addition, personality traits and characteristics deemed important in
the workplace were added. Items were written for each of the traits or
characteristics, and the scales were then refined through item and
factor analyses. Note that in contrast to the 16PF, factor analysis was
used to refine scales rather than to identify constructs. The reliabilities
of the 32 scales for a representative South African sample ranged
between .69 and .88 and in general appear to be satisfactory. Visser
and Du Toit (2004) demonstrated that the relations between the 32
scales may be summarised in terms of six broad personality traits.
These were labelled as Interpersonal, Extraversion, Emotional
stability, Agreeableness, Openness to experience, and
Conscientiousness. You will note that the last five of these factors
correspond to the factors of the Big Five model of personality traits.
Furthermore, Visser and Viviers (2010) investigated the structural
equivalence of the OPQ32 normative version for black and white
South Africans. They found that the internal consistency reliabilities for
black and white South African samples ‘were acceptable … although
the mean alpha for the black group was substantially lower than that
of the white group’ (p. 8). Furthermore, although Visser and Viviers
(2010) found partial support for the structural equivalence of the
OPQ32 across the black and white groups, they caution that further
research is needed before the measure can be used with confidence
across different South African cultural groups.
12.3.6 The Myers-Briggs Type Indicator
Another popular standardised personality questionnaire is the Myers-
Briggs Type Indicator (MBTI). This measure is based on Jung’s theory
of psychological types. The MBTI instrument sorts people into types
instead of measuring along a trait continuum. It consists of four bipolar
scales, namely Extraversion-Introversion (E-I), Thinking-Feeling (T-F),
Sensing-Intuition (S-N), and Judging-Perceiving (J-P). People scoring
high on Extraversion tend to direct their energy to the outside world
and seek interaction with other people. Introverts direct their energy
to their inner world of ideas and concepts, and tend to value privacy
and solitude. Sensing individuals rely on information gained through
the senses and can be described as realistic. Intuitive individuals rely
on information gained through unconscious perception and can be
described as open-minded. People with a preference for Thinking
make decisions in an objective and logical way, whereas Feeling
individuals prefer decisions made on subjective and value-based
grounds. Those showing a preference for Thinking may be described
as tough-minded, and those showing a preference for Feeling may be
described as tender-minded. Individuals with a preference for
Judging seek closure and an organised environment, while
individuals with a preference for Perceiving are characterised by
adaptability and spontaneity (Lanyon & Goodstein, 1997).
By combining the four poles of the scales, it is possible to identify
16 personality types. One such a type might, for example, be ENTP.
People who are assigned this type show preferences for Extraversion,
Intuition, Thinking, and Perceiving. Another type might be ISFJ.
Individuals who are assigned this type show preferences for
Introversion, Sensing, Feeling, and Judging. In contrast to the 16PF,
where the emphasis falls on determining a person’s position in terms
of each of the first- and second-order traits, the emphasis in the MBTI
instrument falls on assigning the individual to one of 16 different types
of people. People who are assigned the same type are assumed to
share similar characteristics. The MBTI manual and several other
publications provide detailed descriptions of the characteristics
associated with each of the 16 types. The MBTI is used widely for
career counselling, team building, and for organisational and personal
development, but not for selection (McCaulley, 2000). It is assumed
that different personality types function better in different
environments, and ideally a match between the individual and the
environment should be sought. The MBTI instrument may also be
profitably used as an indicator of the roles that people are likely to play
in work-related teams. One attractive feature of the MBTI instrument is
that no single type is considered superior to the other types. It is
assumed that each of the 16 types has something positive to offer in
different contexts. Similarly, the two poles of each of the four scales
are presumed equally desirable. The result is that individuals can
always receive positive feedback with an emphasis on their strengths
rather than weaknesses.
The MBTI manual provides evidence that the scales provide
internally consistent scores (Myers & McCaulley, 1985). Factor
analyses of the MBTI items have also provided support for the validity
of the four scales in South Africa (De Bruin, 1996; Van Zyl & Taylor,
2012). The psychometric properties of the MBTI Form M have been
investigated in South Africa, demonstrating good internal consistency
reliability, and good functioning across age, gender, and ethnic groups
(Van Zyl & Taylor, 2012). The MBTI is available in English and
Afrikaans versions.
The Murphy-Meisgeier Type Indicator for Children (MMTIC) was
developed as a type indicator for children and adolescents. The
MMTIC is well-known internationally, and generally has good reliability
and validity evidence. The latest international revision was done in
2015 (Murphy & Meisgeier, 2008), and the relevant validation studies
in South Africa are yet to be conducted.
Also based on Jung’s personality theory, the Jung Personality
Questionnaire (JPQ) is a criterion-referenced personality measure for
adolescents. In South Africa, the JPQ was developed for use with
white English- and Afrikaans-speaking adolescents in Grades 9 to 12,
and is used for career advising and counselling purposes. There is
limited research on the validity and cross-cultural use of the JPQ in
South Africa. Research is urgently needed to update and establish the
psychometric properties of the JPQ for all South African cultural and
language groups.

12.3.7 The Minnesota Multiphasic Personality


Inventory (MMPI)
Unlike the 16PF, which was developed by following a factor-analytic
approach, and the MBTI, which was developed by following a
theoretical or construct approach, the Minnesota Multiphasic
Personality Inventory (MMPI) was developed using a criterion-keying
approach (see Chapter 6). The test-taker is confronted by a number
of statements (e.g. ‘I think that people are all out to get me’), to which
the response choice has to be either ‘True’ or ‘False’. The MMPI was
originally developed to assess personality characteristics indicative of
psychopathology (DeLamatre & Schuerger, 2000). Items were
developed and included in the final version of the MMPI based on their
ability to discriminate between psychiatric patients and normal people.
The 550 items were grouped into 10 clinical scales as well as three
validity scales. The clinical scales are as follows: Hypochondriasis,
Depression, Hysteria, Psychopathic deviate, Masculinity-Femininity,
Paranoia, Psychasthenia, Schizophrenia, Mania, and Social
introversion. The validity scales were included to provide information
about test-taking behaviour and attitudes and response sets (e.g.
carelessness, malingering, acquiescence, answering in socially
desirable ways). By studying the validity scales, it can be determined
whether the person is ‘faking good’ (deliberately putting themselves in
a favourable light) or ‘faking bad’ (deliberately putting themselves in a
negative light) – both of which could be indicative of malingering.
As the MMPI was designed to identify pathological personality
characteristics, it was used widely in mental hospitals and for
psychodiagnostic purposes, but was not considered to be a useful
measure of ‘normal’ personality traits. Thus, when the MMPI was
revised, the brief was to broaden the original item pool to make it more
suitable for ‘normal’ individuals as well. Added to this, the revision
focused on updating the norms, revising the content and language of
the items to make them more contemporaneous and non-sexist, and
on developing separate forms for adults and adolescents. The end
result was the publication of the MMPI-2 (Hathaway & McKinley,
1989). The MMPI-2 consists of 567 true/false questions, requires a
reading level of at least Grade 8, and takes about 90 minutes to
complete. The first 370 statements are used to score the original 10
clinical and three validity scales. The remaining 197 items, of which
107 are new, are used to score an additional 104 validity, content, and
supplementary scales and subscales.
The MMPI has not been adapted for use in South Africa but South
African research studies have been undertaken. Becker (2013)
investigated the reliability of the MMPI from an existing user database
and found that the reliability coefficients for the 16 content scales and
six of the 10 clinical scales were above .70. The four clinical scales
with lower reliability scores were in the same range as the reliability
scores obtained in the international research on the MMPI-2.
According to Graham (2006), reports of low internal consistency in the
clinical subscales are as a result of the empirical manner in which
these scales were developed. Taylor and Vorster (2015) also
investigated the psychometric properties of the MMPI-2 in a sample of
nuclear operators and found similar technical results to Becker (2013)
and the international research (Butcher et al., 2001). They also
concluded that the MMPI-2 did not overdiagnose pathology within the
non-clinical sample, rendering it fit for use in high-risk selection
contexts in South Africa.
The Personality Inventory for Children (PIC) was constructed
using a similar methodology to that of the MMPI. As was the case with
the MMPI, it has been revised and the revised version is known as the
PIC-R. It can be used with children and adolescents from 3 to 16
years of age. However, parents or other knowledgeable adults
complete the true/false items, not the children. Thus, the PIC and PIC-
R are not self-report measures. Rather, they rely on the behavioural
observation of a significant person in the child’s life. Parents, however,
can also be biased. The inclusion of validity scales in the PIC and
PIC-R provide some indication of the truthfulness and trustworthiness
of the responses. The assessment practitioner is advised to combine
the information obtained from the PIC and PIC-R with personal
observations and other collateral information.
While the PIC-R has been adapted, translated, and researched in
South Africa with preschool and Grade 1 samples, no research has
been conducted with the PIY. The PIC-R research suggests that it can
be used cross-culturally in South Africa, but that the large proportion
of double negatives in the items is problematic (Kroukamp, 1991).
Having looked at a range of personality measures in sections
12.3.1 to 12.3.7, it is particularly worrying that the development,
adaptation, and research into the use of structured objective
personality questionnaires for children and adolescents across all
South African cultural groups lags behind the progress being made in
terms of adult personality assessment.

12.3.8 Cross-cultural use of structured


personality assessment measures
It is common practice for psychologists to take an assessment
measure developed in one culture and to adapt it for use in another
culture (see Chapter 7). This is known as an etic approach to
assessment. In terms of the assessment of personality, adopting an
etic approach implies adapting and using measures that tap
dimensions of personality that we assume are universal (Berry, 2000;
Berry, Poortinga, Segall & Dasen, 2002; Markus & Kitayama, 1998).
For example, the Big Five personality traits (discussed in Section
12.3.2) have received tremendous support for their cross-cultural
applicability worldwide (i.e. they are considered to be universal traits).
Using an etic approach, examples of internationally developed
personality measures adapted for use in South Africa are the 16PF
(Section 12.3.1), the HPI (Section 12.3.4), the MBTI (Section 12.3.6)
and MMPI (Section 12.3.7). Psychologists often use such measures to
make cross-cultural research comparisons. One important
precondition for cross-cultural comparisons with structured
assessment measures, such as personality questionnaires and
inventories, is that the constructs that they measure must have the
same meaning in the two cultures. This is not always easily achieved
and assessment instruments can often be said to be biased. One way
of defining bias is to say that the measure does not measure the same
construct in the different cultures, and therefore no valid comparisons
between people of the different cultures can be drawn (see Chapter
7). The assessment of bias is usually done by examining the patterns
of interrelationships between the items included in the measure. If
these relationships are not the same in the different cultures, the
conclusion can be drawn that the measure is not measuring the same
constructs and that the measure is therefore biased (Huysamen,
2004; Paunonen & Ashton, 1998; Poortinga & Klieme, 2016). A further
way to explore whether a measure is measuring the same construct
across cultures is to compare the personality factor structure across
cultures in order to determine the structural equivalence of the
measure across cultures.
Paunonen and Ashton (1998) pointed out that if an assessment
measure does not measure exactly the same construct in different
cultures all is not lost. What is required for valid interpretation is that
the psychologist should know what the meaning of particular
constructs (as represented by scores for personality measures) is in
the culture in which he or she intends to use it. However, much
research needs to be done before psychologists will be able to state
confidently what scores on the 16PF, BTI, MBTI, JPQ, HSPQ, CPQ,
and so on, mean for different groups in South Africa.
A challenge when using an etic approach is that while the
measures tap personality constructs that ‘are universal for all human
individuals, they are also conceived as culturally neutral’ (Iliescu &
Ispas, 2016, p. 135). The problem is that in an etic approach, culture-
specific theories, constructs, and meanings are not included, while
Western-centric theories and personality dimensions are imposed on
test-takers from another cultural context. For this reason, it is argued
that choosing an emic approach is potentially better. Adopting an
emic approach in personality assessment requires the development of
an indigenous personality measure based on the delineation of
constructs and the personality structure derived in a specific culture
(i.e. a bottom-up approach rather than the top-down etic one).
However, a criticism of the emic approach is that although it provides
‘a better understanding of individual behavior relevant in a specific
culture’ (Iliescu & Ispas, 2016, p. 137), it makes comparison across
cultures impossible as universal aspects of personality might not be
sufficiently covered. It is for this reason that when developing
structured personality-assessment measures for use in cross-cultural
research and applied settings, a combined emic-etic approach has
become popular (Iliescu & Ispas, 2016). The combined emic-etic
approach includes the consideration of both universal (etic) and
culture-specific, indigenous constructs (emic) when developing a
measure (Cheung, Van de Vijver and Leung, 2011). Among the
measures developed using a combined emic-etic approach are the
Cross-Cultural Personality Assessment Inventory (CPAI) (Cheung,
Leung, Fan, Song, Zhang & Zhang, 1996), and the South African
Personality Inventory (SAPI) (Fetvadjiev, Meiring, Van de Vijver, Nel &
Hill, 2015; Nel, Valchev, Rothmann, Van de Vijver, Meiring & De
Bruin, 2012). The development of the SAPI will be outlined in the next
section as an example of using a combined emic-etic approach in
developing and African-centred personality measure.

12.3.9 The South African Personality Inventory (SAPI)


As was mentioned in Section 12.3.2, numerous research studies have
found that measures of personality that take local contexts and
cultures into account are likely to yield better results than international
measures adapted for use here using an etic approach. Consequently,
a team of national and international experts in cross-cultural
personality assessment are in the final phase of developing the South
African Personality Inventory (SAPI) (Fetvadjiev, Meiring, Van de
Vijver, Nel & Hill, 2015; Nel et al., 2012).
The SAPI was developed using a combined emic-etic approach
(see 12.3.8) for the diverse South African cultural and language
groups. Table 12.3 draws on the work of Fetvadjiev et al. (2015) and
Nel et al. (2012) to briefly outline how the SAPI was developed.
Readers should consult these two publications to gain a deeper
understanding of the SAPI’s development.
Research into the psychometric properties of the SAPI is ongoing.
Fetvadjiev et al. (2015) found evidence of construct validity, internal
consistency, and that there is sufficient comparability of the six-factor
structure across cultural groups. Fetvadjiev et al.’s. (2015) findings
also suggested that items in some of the facets might need refining.
Table 12.3 The stages in developing the South African Personality Inventory (SAPI)

STAGE STEPS OUTCOMES

ONE: An emic-driven, lexical 188 initial facets reduced


Developing approach was used. The to 37 initial sub-clusters
a qualitative steps entailed: using semantic cluster
South • Eliciting personality analysis reduced to nine
African descriptions from main clusters, namely:
personality participants in SA’s 11 • Extraversion
structure official language • Soft-heartedness
groups. • Conscientiousness
• The personality • Emotional stability
descriptions were • Intellect
provided in • Openness
participants’ home • Integrity
languages.
• Relationship harmony
• The descriptions were
• Facilitating
translated into English
and then verified by
language experts.
• The data were cleaned
in that only words
related to personality
were retained.
• The words (personality
descriptors) were
content analysed and
categorised into facets.
• The development team
and cultural and
language experts
interrogated and
adapted the
categorisation of the
descriptors into facets.
• Semantic clustering of
the facets into sub-
clusters was undertaken
using a range of lexical
resources.
• The sub-clusters were
categorised into
clusters. The clusters
and subclusters
represented the
emergent qualitative
personality structure.

TWO: The steps entailed: • 2 574 items were


Developing, 1. Items were developed initially developed
piloting, for each of the clusters across the nine clusters
reducing and pilot tested. • The various reduction
and 2. The items were reduced activities led to the
retaining and those retained had items being reduced to
items, and to show good construct a final number of 146
final representation; not • The nine clusters were
refinement overlap with other refined to arrive at the
of the facets or clusters (based final structure of the
personality on factor loadings); and SAPI, which has six
structure use wording that was dimensions and 18
easily understandable facets, namely:
and translatable. – Conscientiousness
3. Cultural and linguistic − achievement-
experts then reduced orientation,
the items further. emotional maturity,
4. A multi-ethnic study integrity, orderliness,
was then done and a traditionalism-
factor analysis religiosity
performed, which led – Extraversion −
to the final items playfulness,
being selected. sociability
5. The factor analysis – Intellect/openness
findings led to the − broad-
reduction of the nine mindedness,
clusters to six epistemic curiosity,
dimensions and 18 intellect
facets (Fetvadjiev et – Negative-social
al., 2015). relational −
conflict-seeking,
deceitfulness,
hostile egoism
– Neuroticism
− negative
emotionality
– Positive social-
relational −
facilitating,
interpersonal
relatedness, social
intelligence, warm-
heartedness

❱❱ CRITICAL THINKING CHALLENGE 12.3

The SAPI was developed using a combined emic-etic approach. An emic-driven,


lexical approach was used to allow the personality dimensions and facets to
emerge from the ground up by getting personality descriptions from ordinary
people. Thereafter the development team used a range of qualitative
and quantitative approaches, and consulted various resources and
experts, to arrive at the final six dimensions and 18 facets.
Consult the dimensions and factors included in the personality measures
for adults discussed in Section 12.3.1 to 12.3.7. If you find similarities and
differences, does this suggest that the SAPI is tapping some universal (etic)
personality traits as well as some unique, more indigenous (emic) traits?

Level 2: Assessment of motives


12.4
and personal concerns
McAdams (1994) argues that knowledge of stable personality traits is
not enough to know a person completely. Traits provide a
decontextualised picture of an individual; that is, a picture of how the
person generally behaves across many different situations. This view
does not acknowledge the importance of the context in which
behaviour takes place. It also does not give insight into the motives
that underlie behaviour. Traits also do not provide any information
about a person’s dreams or about what a person wishes to achieve.
McAdams (1994) calls the constructs held in this second level of
personality ‘personal concerns’ (p. 376), which have to do with a
person’s goals and motives: What they want and how they go about
getting it.
Pervin (1983) suggested a theory of personality that explains
human behaviour in terms of goal-directed behaviour. He maintains
that behaviour can be understood in the study of the pursuit and
acquisition of goals an individual sets. These goals have an underlying
structure and can function in conflict with one another at times.
Personality problems can therefore stem from not having goals,
having goals in conflict, and experiencing frustration in the
achievement of goals. Pervin (1989) also suggests that goals
influence thoughts and emotional reactions, and that they are
available consciously, but do not have to be in conscious awareness
during the pursuit of the goal.
In the assessment of motives, McClelland, Koestner and
Weinberger (1989) distinguished between explicit and implicit motives.
Explicit motives relate to a person’s self-conceptualisation of their
desires and are generally measured by self-report questionnaires, in a
similar way to traits. Implicit motives relate to subconscious
motivations for behaviour. Winter, Stewart, John, Klohnen, and
Duncan (1998) argue that these motives cannot be measured and
assessed in the same way as traits. Motives need to be assessed by
means that access implicit aspects of personality. Motives ‘involve
wishes, desires or goals’ (Winter et al., 1998, p. 231) and are often
unconscious or implicit. Hence, it is necessary to use indirect methods
to assess implicit motives.

12.4.1 Explicit motives


One of the assessments associated with the measurement of explicit
motives is the Motives, Values, Preferences Inventory (MVPI). This
assessment was specifically developed for the workplace and looks at
10 drivers that influence behaviour. These 10 motives are seen to
influence the kind of projects a person may take on, the type of work
environment that they enjoy, and also the environment they would
likely create if they were in a leadership position (Hogan Assessment
Systems, 2014a). These underlying motives, which also reflect work
values, tend to influence what a person perceives to be ‘good’ and
‘bad’ or acceptable or not in the workplace. These so-called
unconscious biases tend to influence behaviour and contribute to a
person’s reputation at work.
The MVPI has been standardised in the South African context, and
shows good evidence for reliability and validity (Hogan Assessment
Systems, 2014a). There was no real evidence for the presence of bias
across gender and ethnic groups, and good factor congruence was
demonstrated across groups.

12.4.2 Implicit motives


Implicit motives are measured by making use of projective
assessment techniques (see Section 12.1). One of the most widely
used methods of gaining insight into unconscious motives is the
Thematic Apperception Test (TAT). This measure is also one of the
most widely used psychological measures in clinical and
psychodiagnostic contexts (Watkins, Campbell, Nieberding &
Hallmark, 1995).
Morgan and Murray (1935) developed the TAT as a way of gaining
insight into implicit motives. This test requires of respondents to ‘make
up stories about a series of vague or ambiguous pictures’ (Winter et
al., 1998, p. 232). These stories can be presented orally or in writing.
Respondents are instructed to make their stories as dramatic as
possible. Each story is to include a description of what led up to the
event in the picture, what is happening at the moment, and what the
outcome of the story will be. Respondents are also encouraged to
describe the thoughts and feelings of the characters in the stories.
The assumption is made that the individual will project his or her
own wishes, needs, and conflicts into the story that he or she tells.
Hence, the TAT can be used to gain insight into aspects and themes
of personality that relate to basic needs and conflicts that are partially
hidden. It is assumed that the characteristics of the main characters in
the stories reflect tendencies in the respondent’s own personality. It is
further assumed that the characteristics of the main characters’
environments reflect the quality of the respondent’s environment
(Lanyon & Goodstein, 1997).
McAdams (1994) argues that the TAT is most useful as an
indicator of motives. It appears that the TAT measures three major
motives, namely need for intimacy, need for achievement, and need
for power. Objective scoring systems have been devised to score
respondents’ TAT stories in terms of the three major motives.
Although these objective scoring systems for the TAT are available,
most clinicians tend to take an intuitive and impressionistic approach
in interpreting people’s responses to the stimuli. This has led many
researchers to voice their concern over the reliability and validity of the
TAT. When objective scoring systems are used, the inter-scorer
reliability of the TAT seems satisfactory (Murstein, 1972). However,
the test-retest reliability of the TAT does not measure up to
conventional psychometric standards. Winter and Stewart (1977) and
Lundy (1985) investigated the effects of different assessment
instructions on the test-retest reliability of the TAT. They found that
most people assume that it is expected of them to tell different stories
at the retest and that this may account for the low test-retest
reliabilities. When respondents were told that is not a problem if their
stories are similar to their previous stories, test-retest reliabilities were
substantially improved.
Personality researchers and theorists hold different views on the
validity of the TAT. It appears that the best evidence in support of the
criterion validity of the TAT is centred on the three major motives,
namely need for achievement, need for power, and need for intimacy.
Groth-Marnat (2009) points out that the motives measured by the TAT
influence long-term behaviours rather than short-term behaviours.
Hence, it is not expected that the TAT will correlate with specific acts
or immediate behaviour (McClelland et al., 1989). In a meta-analysis,
Spangler (1992) reported that the TAT does predict long-term
outcomes, such as success in a person’s career. Item Response
Theory methods have been applied to the scoring of the TAT, with the
results providing support for the need for achievement motive
(Tuerlinckx, De Boeck & Lens, 2002).
Retief (1987) pointed out that successful cross-cultural use of
thematic apperception tests requires that ‘the constructs chosen and
the signs or symbols used in the stimulus material’ (p. 49) should not
be unfamiliar or alien to individuals of the ‘new’ culture. This means
that the stimulus materials often need to be adapted for cross-cultural
use. The stimuli of the original TAT reflect symbols and situations
associated with Westernised contexts. If these stimuli are presented to
individuals from a different culture, the stimuli may take on different
meanings, which would lead to difficulty in interpreting the responses
to the stimuli. Because the nature and quality of the stimulus material
influence the richness and the meaning of the stories or responses
they stimulate, it is important that great attention should be paid to the
selection of people, situations, and symbols that are included in the
pictures of thematic apperception tests.
Psychologists in southern Africa have attempted to counter this
problem by specifically devising thematic apperception tests for use
with black populations. Early attempts at establishing thematic
apperception tests for black southern Africans include those of
Sherwood (1957), De Ridder (1961), Baran (1972), and Erasmus
(1975). However, Retief (1987) commented that some of these
measures appear to be biased and to reflect cultural and racial
stereotypes that may be offensive to some people.
In addition to the unsuitability of many of the original TAT cards for
black South Africans, the stimuli may also seem outdated and
unfamiliar for many white, coloured, and Indian South Africans. It
seems that psychologists in South Africa are in need of a thematic
apperception test that reflects the contemporary cultural complexity
and richness of its society.

12.4.3 Other projective methods of personality assessment


Other popular projective methods that may provide useful information
on implicit or unconscious aspects of personality include the
Rorschach Inkblot Test, the Draw-A-Person test, and sentence-
completion tests.
In the Rorschach test, individuals are shown pictures of inkblots.
The task of the test-taker is to report what he or she sees in the
inkblots. Several sophisticated scoring methods have been developed
for the Rorschach Inkblot Test. See Section 9.3.3 in Chapter 9 and
consult Moletsane (2004), and Moletsane and Eloff (2006) for a
discussion on the Adjusted Rorschach Comprehensive System that
they developed for South African children with little exposure to
Western-oriented tests. Examples of the adaptations included allowing
the test-takers to respond in the language they are comfortable with
and using drawings to enhance their description of what they ‘saw’ in
the inkblot. The response rate of the test-takers increased when the
adjusted procedure was used, which allowed for a richer
understanding of aspects of the test-taker’s personality.
In the Draw-A-Person test, the individual is generally instructed to
draw a picture of a person. The individual may thereafter be instructed
to draw a picture of someone of the opposite sex than the person in
the first picture. Some psychologists may also request the individual to
draw himself or herself. In therapeutic situations, psychologists may
proceed by asking the individual questions about the picture(s) that
the test-taker has drawn. The Draw-A-Person Test and how it can be
adapted for use with African children is discussed further in Section
15.3.4 in Chapter 15.
Sentence-completion tests require the individual to complete
sentences of which only the stem has been given, such as ‘I get upset
when …’ With all three methods described above, the assumption is
made that the individual will project aspects of his or her personality
into his or her responses. However, the same psychometric criticisms
directed against the TAT are also directed against these methods. For
a more comprehensive discussion of these methods, interested
readers are referred to Groth-Marnat (2009).

Personality and
12.5
counterproductive behaviour at work
When selecting people for positions in the workplace, most of the
focus is on the positive characteristics that a person possesses that
would enable him or her to be successful in that position. However,
there is also a need to ensure that the person being hired will not pose
a risk to the organisation or the people that they work with due to
negative behaviours. These negative behaviours could be described
as counterproductive work behaviours (CWB) and could either be
deliberate or unconscious responses to stress in the workplace. Types
of CWB could range from seemingly innocuous behaviours like being
manipulative or blaming others when things go wrong to interpersonal
abuse or absenteeism, and finally to serious crimes such as theft,
fraud, and destruction of company property. Two assessments look at
these negative behaviours from a personality perspective, namely the
Work-related Risk and Integrity Scales (WRISc), and the Hogan
Development Survey (HDS).
12.5.1 Work-related Risk and Integrity Scales (WRISc)
Van Zyl and De Bruin (2016) developed the Work-related Risk and
Integrity scales (WRISc) in response to a need for a locally
constructed assessment of potential counterproductive work
behaviour. Tests used for the assessment of CWB are generally
referred to as integrity tests (Berry, Sackett & Wiemann, 2007) and
could either be overt (asking directly whether a person has or Is likely
to engage in CWB) or covert (based on personality where the purpose
of the assessment is not obvious). The WRISc is a covert measure,
and identifies 12 narrow personality constructs that are empirically
linked to engaging in CWB.
The 12 scales of the WRISc are: Aggression, Callous affect,
Cynicism, Egotism, External locus of control, Impulsivity, Low effortful
control, Manipulation, Negative affect, Pessimism, Risk-taking and
Rule defiance. Several of these scales can also be combined to give
an indication of complexes or toxic personality profiles that can have a
profoundly negative impact on the organisation.
The WRISc is a South African-developed test, so has local norms
and local research. The test has good internal consistency reliability,
with Cronbach alpha values above .74 for all scales. The structure of
the assessment appears sound, and there was no practical evidence
of bias in any of the scales. Correlations with measures of
counterproductive behaviours show good concurrent validity (Van Zyl
& De Bruin, 2016).

12.5.2 Hogan Development Survey


The Hogan Development Survey (HDS) is a measure of what is
known as the ‘dark side’ of personality. The HDS gives an indication of
the possible performance risks for an individual when stressed, tired,
bored, or complacent in the workplace. The 11 scales of the HDS are
often referred to as ‘derailers’ and they provide scores that indicate
how likely it is that a person is going to engage in these negative work
behaviours at any given time. The higher the risk, the more likely the
person is to behave in a way that will harm their performance in the
workplace.
The 11 scales of the HDS are Excitable, Sceptical, Cautious,
Reserved, Leisurely, Bold, Mischievous, Colourful, Imaginative,
Diligent, and Dutiful. Each describes a behaviour that could be an
overused strength that a person relies on when their resources for
managing their behaviour are low. These negatively impact on the
person’s reputation in the workplace, which in turn could limit their
likelihood to succeed.
The HDS has been standardised in South Africa, and has good
reliability and validity evidence across cultures and gender groups
(Hogan Assessment Systems, 2014b). The scales of the HDS have
been shown to be negatively correlated with emotional intelligence
(Taylor & De Beer, 2010), and adjustment on the Hogan Personality
Inventory (HPI).

Uses and applications of


12.6
personality tests
Below are a few examples of contexts in which personality tests are
used.
Research
Personality measures are often used as part of psychological research
to help gain a deeper understanding of how psychological constructs
are related to each other, to compare personality profiles across
cultures, and so on.
Self-development
Personality tests are often used in personal-development programmes
where the aim is to develop deeper self-understanding and
psychological growth.
Career and academic counselling
When used in conjunction with other career assessments (see
Chapter 13), personality tests can help provide guidance as to the
kinds of careers and academic subjects that would best suit an
individual.
In psychotherapy
Assessments of normal personality may help the client and
psychotherapist to understand relatively stable patterns of behaviour
that may contribute to the client’s difficulties. They may also reveal
strengths in the client’s psychological make-up that may be
emphasised and built on in psychotherapy. Assessments may also be
used to evaluate whether any changes in stable behaviour patterns
are occurring as a result of psychotherapy. Clinical assessments of
personality, such as the MMPI, can be used to help diagnose
psychological disorders.
Personnel selection
Personality measures have enormous utility in selection procedures.
They can either be linked to competencies (see Chapter 15 and the
case study below) or be used in the form of benchmark profiles that
describe high performers. The basic premise is that certain personality
traits are related to performance on the job, and so can help select the
best candidate when applicants have similar qualifications and
experience.

❱❱ CASE STUDY 12.1

Applying personality assessments for selection in the workplace


A large organisation was in the process of recruiting security personnel and wanted
to include a personality assessment as a part of the selection assessment battery.
The industrial psychologist in charge of the selection process reviewed the research
available for personality predicting behaviour of security officers, and decided to use
a Big Five measure of personality. After looking at all the options, she chose the
Basic Traits Inventory. After a thorough job analysis, she matched the relevant facets
of the BTI to the various aspects of the job. Two of these aspects were: Staying calm
under pressure, and Rule-following. She matched two Neuroticism scales, Anxiety
and Affective instability, to Staying calm under pressure, and set low scores on these
scales as ideal. For Rule-following, she chose high scores on the Conscientiousness
scales Prudence and Dutifulness as indicators. After the security officers had been
through a CV-screening and reference-checking process, they completed the battery
of assessments. Based on their final profiles, the psychologist drew up a ranked
spreadsheet of the
applicants, with the top candidates progressing to the next round of selection. This
ensures that, at an early stage in the process, candidates with the best psychological
profile move to the more qualitative assessment stages of selection.

12.7 Conclusion
In order to understand the uniqueness of the individual, it is necessary
to examine personality on all three levels of McAdams’ scheme. The
first level of the scheme provides insight into the individual’s typical
ways of interacting with the world and characteristic patterns of
behaviour. Structured personality assessment tools such as the 16PF,
BTI, OPQ, MMPI and SAPI may help psychologists to understand and
describe these typical patterns of interaction and behaviour. The
information gained by using such tools may also be useful in
predicting how an individual is likely to behave in the future. However,
these assessment measures do not provide insight into the reasons or
motives for a person’s behaviour. In order to achieve this,
psychologists need to move to the second level of McAdams’ (1995)
scheme where the focus falls on personal concerns, projects, and
motives. Projective techniques, such as the TAT, may be useful in
gaining insight into implicit and unconscious aspects of personality.
This may provide insight into the reasons for a person’s behaviour. By
combining information gained on both these levels, and by integrating
it with an individual’s life story, the psychologist can construct an
integrated and rich picture of the person who is being assessed.

❱❱ ETHICAL DILEMMA CASE STUDY 12.1

Thembi Mdluli is an industrial psychologist working for a financial institution called


ACME Bank. ACME Bank requires new bank tellers to fill vacancies across its
operations in South Africa. ACME Bank has indicated that they require bank tellers
with specific behavioural traits, such as Detail consciousness, Risk aversion, and
Rule compliance. Thembi has decided that prospective bank tellers should complete
the Thematic Apperception Test (TAT) in order to determine whether they meet the
behavioural requirements for the job. She has also decided that the
Minnesota Multiphasic Personality Inventory (MMPI) is necessary because all
employees should be psychologically healthy. In order to save time, Thembi will
only use the results of these assessments to select the bank tellers and will not
obtain feedback from the respondents to contextualise the results. Thembi argues
that prospective bank tellers do not need to be given feedback on their
assessment results because this will increase the cost of the selection process.

Questions:
1. What are the possible ethical dilemmas with regards to the use
of personality assessments in this case study?
2. What are the possible ethical dilemmas with regards to the
assessment process?
3. Are there any professional or legal issues that would relate to
personality assessment (e.g. scope of practice)?
4. What assessments and process would you recommend if you
were in Thembi’s situation?

CHECK YOUR PROGRESS 12.1


12.1 Draw a distinction between structured and projective methods
of personality assessment.
12.2 Discuss potential problems with cross-cultural use of
structured personality assessment methods.
12.3 Discuss the potential usefulness of a combined emic-etic
approach when developing measures for use in multicultural
contexts, using the BTI and SAPI as examples.
12.4 How was factor analysis used in the development of the 16PF?
12.5 What criticisms are commonly directed at projective methods
of personality assessment?
12.6 Discuss potential problems with the cross-cultural use of the TAT.

ADDITIONAL READING: STUDYING PERSONALITY ACROSS CULTURES

Cheung, F.M. & Fetvadjiev, V.H. 2016. Indigenous approaches to testing and
assessment. In F.T. Leong, D. Bartram, F.M. Cheung, K.F. Geisinger, & D Iliescu.
(Eds.), The ITC international handbook of testing and assessment (pp. 333−346).
New York, NY: Oxford University Press.
Cheung, F.M., van de Vijver, F.J.R. & Leung, F.T.L. 2011. Toward a new approach
to the study of personality in culture. American Psychologist, 66, pp. 593−603.
Fetvadjiev, V.H., Meiring, D., Van de Vijver, F.J., Nel, J.A. & Hill, C.
2015. The South African personality inventory (SAPI): A culture-
informed instrument for the country’s main ethnocultural
groups’, Psychological Assessment, 27(3), pp. 827−837
Iliescu, D. & Ispas, D. 2016. Personality assessment. In F.T. Leong, D. Bartram, F.M.
Cheung, K.F. Geisinger, & D Iliescu., (Eds.), The ITC international handbook of
testing and assessment (pp. 134−146). New York, NY: Oxford University Press.
Valchev, V.H., Nel, J.A., Van de Vijver, F.J.R., Meiring, D., De Bruin, G.P., &
Rothman, S. 2013. Similarities and differences in implicit personality
concepts across enthnocultural groups in South Africa. Journal of Cross-
Cultural Psychology, 44, pp. 365−388. DOI: 10.1177/0022022112443856.
Valchev, V.H., Van de Vijver, F.J.R., Nel, J.A., Rothman, S. & Meiring, D. 2013.
The use of traits and contextual information in free personality descriptions
across ethnocultural groups In South Africa. Journal of Personality and
Social Psychology, 104, pp. 1077− 1091. DOI: 10.1037/a0032276.

There have been quite a few stops to make in the Types of Measure
Zone. You only have two left. At the next stop in this zone, you will
take a look at some of the measures used in career assessment.
Career-counselling assessment
GIDEON P. DE BRUIN AND KARINA DE BRUIN
Chapter CHAPTER OUTCOMES
13 Once you have studied this chapter you will be able to:
› identify the central theoretical concepts and assumptions of four
approaches to career-counselling assessment: the person-environment fit,
developmental, systems, and career construction/life-design
› identify the major differences between these four approaches
› identify points of criticism against the approaches
› name the most important measures and techniques associated with each of
the four approaches.

13.1 Introduction
Career counselling can be defined broadly as the process in which a
professional career counsellor guides clients in identifying their current
and potential skill strengths and career options that provide them with
the best opportunities to use these strengths (Ryan Krane & Tirre,
2005). Career counselling is a comprehensive process and includes
any counselling activity related to the career decisions a client has to
make (Zunker, 2016). More and more, individuals do not make a
career choice for life but, rather, make numerous career decisions.
The current work environment, which is marked by volatility,
uncertainty, complexity, and ambiguity, forces individuals to be flexible
to adapt to these challenges often resulting in career decisions that in
turn result in regular job and career changes.
Zunker (2016) describes several career-counselling models, each
consisting of a number of stages. Although these models use a wide
range of techniques, the stages appear to be very similar. Career
counselling usually includes an intake interview, assessment,
diagnosis, and problem identification. These steps are followed by a
counselling process that focuses on a collaborative relationship with
the client, intervention strategies, and the evaluation of outcomes and
future plans.
Although standardised assessment does not dominate career
counselling in all these models, it plays an important role in gathering
information about the client. In this chapter we aim to provide an
overview of career-counselling assessment from four different
theoretical points of view. The first two, namely the person-
environment fit and developmental approaches, represent traditional
assessment approaches. The last two, the systems and career
construction/life-design approaches, are strongly influenced by
constructivism and represent a qualitative assessment approach.
There are at least three general reasons why assessment in career
counselling is important: ‘(1) to stimulate, broaden, and provide focus
to career exploration; (2) to stimulate exploration of self in relation to
career; and (3) to provide what-if information with respect to various
career choice options’ (Prediger, 1974, p. 338). Savickas, Nota,
Rossier, Dauwalder, Duarte, Guichard, Soresi, Van Esbroeck, Van
Vianen and Annelies (2009) and Maree (2010a) add that career
counselling/assessment is useful in assisting clients to design their
lives in the new postmodern world of work. Assessment in career
counselling is likely to be important irrespective of which theoretical
approach the career counsellor follows. However, as you will see in
the sections that follow, each of the theoretical approaches
emphasises different aspects of the assessment process. Individual
career counsellors often have a preference for certain assessment
tools and techniques. Some find the use of standardised
psychological tests useful, whereas others may feel more comfortable
with qualitative methods. For others, a combination of standardised
psychological tests and qualitative assessment techniques may be
more appealing. To best provide for the needs of the client, a
comprehensive and collaborative approach to career assessment –
including both standardised psychological tests and qualitative
assessment procedures – is often the most desirable approach
(Maree & Morgan, 2012).

13.2 The person-environment


fit approach
Early in the twentieth century, Parsons (1909) stated the basic
principles of the trait-and-factor approach to career counselling, which
has since evolved into the person-environment fit approach. When
choosing a career a person needs to:
know himself or herself
know the world of work
find a fit between his or her characteristics and the world of work.

This approach to career counselling was enthusiastically espoused


by applied psychologists and psychologists interested in individual
differences. Differential psychologists developed psychological tests
and questionnaires that could be used to indicate differences in
intelligence, specific abilities, personality traits, and interests. Using
such tests and questionnaires, counselling psychologists and career
counsellors could determine the relative ‘strengths’ and ‘weaknesses’
of their clients. The task of the career counsellor became one of
helping the client to better understand his or her abilities, personality
traits, and interests. The next step was to match the acquired self-
knowledge with knowledge about the world and work, and to choose
an appropriate occupation.
This approach to career counselling is still widely used, although
contemporary writers emphasise the dynamic aspects of person-
environment fit (Sampson, 2009). This approach supports the idea
that successful work adjustment depends on correspondence between
an individual’s characteristics and the characteristics of the working
environment (Holland, 1997). Every person has needs that must be
met and skills that he or she can offer to the working environment.
Similarly, every working environment has needs and reinforcers (for
example financial, social, and psychological benefits) to offer to the
individual. Ideal work adjustment will take place if the individual’s skills
correspond to the needs of the working environment and the
reinforcers of the working environment correspond to the needs of the
individual (Dawis, 2002; Dawis & Lofquist, 1984).
Next, we will discuss several domains that form an important part
of assessment from a person-environment fit approach.
13.2.1 Assessing intelligence
Although intelligence measures have been formally used for almost a
hundred years, there still is no consensus among psychologists about
what intelligence really is (see Chapter 10). However, psychologists
do know that scores on well-constructed intelligence measures are
powerful predictors of academic success and work performance (De
Bruin, De Bruin, Dercksen & Hartslief-Cilliers, 2005; Kuncel, Hezlett &
Ones, 2004; Rosander, Bäckström & Stenberg, 2011).
Intelligence measures can be used profitably in career-counselling
contexts, but career counsellors should keep in mind that many other
factors also play important roles in the prediction of achievement (cf.
Poropat, 2009; Rosander et al., 2011). Such factors may include
socio-economic status, quality of schooling, test anxiety, and
measurement error. Another important factor in multicultural contexts
is that measures may provide biased scores when used with groups
that differ from the standardisation sample (see Chapter 7). Therefore,
it is not wise to base any decision on whether a person will be able to
cope with the cognitive demands of any particular academic or training
course or occupation on a total IQ score alone. Career counsellors
should seek additional information that may confirm the scores on the
intelligence measure, such as reports of previous academic or work
performance. Career counsellors should also explore the role of
factors such as language and socio-economic status in an individual’s
performance on an intelligence measure.
Both individual and group intelligence measures lend themselves
to being used in career-assessment contexts. You were introduced to
a variety of these measures in Chapter 10. Probably one of the most
widely used group intelligence measures for career-counselling
purposes in South Africa is the General Scholastic Ability Test (GSAT;
Claassen, De Beer, Hugo & Meyer, 1991). The Raven Progressive
Matrices (Raven, 1965) are also popular as they are non-verbal
measures (De Bruin et al., 2005). South African norms are available
for the three different forms of this instrument (JvR Psychometrics,
2015). The Wechsler Adult Intelligence Scale (Claassen, Krynauw,
Paterson & Wa Ga Mathe, 2001) is often used for individual
assessment purposes. The fourth edition of this scale has been
adapted for South Africa and is available as the WAIS-IVSA (Wechsler,
2014).

13.2.2 Assessing aptitude


Career counsellors have long relied on measures of specific abilities
or aptitudes to try and match clients with specific occupations. The
assumption is that different occupations require different skills or
competencies. Measures of specific abilities may be used to assess
whether an individual has the potential to be successful in an
occupation. According to Zunker (2016), ‘aptitude tests primarily
measure specific skills and proficiencies or the ability to acquire
certain proficiencies’ (p. 186). Because most aptitude measures are
multidimensional (i.e. they are designed to measure several different
aptitudes), they may also be used to identify relative cognitive
strengths and weaknesses. The aptitude measures discussed in
Chapter 10 lend themselves to being used in career-counselling
contexts.

13.2.3 Assessing vocational interests


An individual’s scores on an interest inventory reflect his or her liking
or preferences for engaging in specific activities and/or occupations
(Isaacson & Brown, 1997). MacAleese (1984) identified three general
purposes of career-interest inventories: to identify interests of which
the client was not aware; to confirm the client’s stated interests; and to
identify discrepancies between a client’s abilities and interests. From a
collaborative career-counselling perspective, the scores on an interest
inventory provide useful material for conversation between counsellor
and client.
The first widely used interest inventory was developed in the US by
Strong (1927). The Strong Interest Inventory has been revised several
times, and it remains one of the most researched and widely used
interest inventories in the US. Several interest inventories have been
developed and standardised in South Africa or adapted for the South
African context.
The Self-directed Search (SDS) is based on the theory of John L.
Holland (1997) and was adapted for South African conditions by Du
Toit and Gevers (1990). It was standardised for all South African
groups and can be used with high school students and adults. The
SDS measures interests for six broad fields: Realistic (practical),
Investigative (scientific), Artistic, Social, Enterprising (business),
and Conventional (clerical). Psychometric analyses of many different
interest inventories have revealed that these six broad fields provide a
satisfactory and useful summary of all the important fields of interest
(De Bruin, 2002). These six fields can also be used to describe
working environments. According to Holland (1997), individuals should
strive to choose careers that will lead to congruence between an
individual’s interests and the characteristics of the working
environment.
Holland’s (1997) theory also describes the relationships between
the six interest fields. Specifically, Holland predicts that the interest
fields are ordered in a circular fashion (R-I-A-S-E-C). Interest fields
that lie next to each other on the circumference of the circle are
assumed to be more closely related than fields that lie further apart.
Unfortunately, there is evidence that the proposed circular structure of
the SDS is not valid for black South African high school students (Du
Toit & De Bruin, 2002) and other non-Westernised samples (Nauta,
2010). There is, however, evidence that the circular structure does
exist in some non-Western cultures, such as China (Yang, Lance, &
Hui, 2006). Career counsellors should thus be cautious when
interpreting the SDS scores for different cultural groups.
Recently, Morgan, De Bruin and De Bruin (2015a) developed the
South African Career Interest Inventory (SACII), which is also based
on Holland’s circular model of vocational interests (Holland, 1997).
Whereas the South African SDS represents an adaptation of the North
American SDS, the content of the SACII was developed from the
outset for the multi-ethnic and multi-lingual South African context.
Morgan et al. (2015), and Morgan, De Bruin and De Bruin (2015b),
reported strong evidence in support of the validity of Holland’s circular
R-I-A-S-E-C structure in the SACII and the equivalence of this
structure across different ethnic and gender groups. These results
support the validity of the SACII in the South African context.
Several other interest inventories have been developed in South
Africa. The MB-10 (Meyer, 2003) measures interests for 10 different
fields, namely Working with individuals, Working with groups,
Business, Numbers, Reading and writing, Art, Handicraft and machine
work, Science, Animals, and Plants. The MB-10 was standardised for
high school learners from the Western Cape, and first-year students at
the University of Stellenbosch. Research has generally found that the
MB-10 has suitable psychometric properties (Rabie, 2005).
The Maree Career Matrix (MCM) (Maree & Taylor, 2016) asks
individuals how they feel about 152 listed careers. More specifically,
they must indicate: (1) the extent to which they are interested in each
of the careers, regardless of their ability to do the kind of work
required in the particular career; and (2) how confident they are that
they will be able to do the work in the particular career. The two sets
of scores result in a two-dimensional career matrix. The MCM has
acceptable psychometric qualities, is easy to use, and can be applied
to individuals and large groups of people with a minimum of Grade 9
second-language English or Afrikaans proficiency (Maree & Taylor,
2016).

13.2.4 Assessing personality


The aim of personality assessment is to identify an individual’s salient
personality characteristics and to match these characteristics to the
requirements of occupations. For example, if a person is extroverted
and group-dependent, occupations that allow a person to have social
contact and to work in groups may be more desirable than
occupations where the person has no social contact. The assumption
is made that personality traits reflect basic and relatively stable
tendencies to behave in certain ways (Larsen & Buss, 2014).
Individuals also seek out environments that correspond with their
personality traits, for example extroverts seek out situations with lots
of social activity, whereas introverts seek out situations where they
can avoid social contact. We mention three standardised personality
measures that may be employed fruitfully in the career counselling
context next. These measures were discussed in more detail in
Chapter 12.
The 16 Personality Factor Questionnaire (16PF; Cattell, Eber &
Tatsuoka, 1970) is often used during career counselling. Several
personality profiles have been developed for different occupations,
and an individual’s profile is then compared to these profile types. If
the individual’s profile fits the profile type for a specific occupation, it
would suggest that this occupation may be worth exploring further.
Similarly, if the individual’s profile differs significantly from the profile
type for a certain occupation, the individual may want to explore
alternative options. However, career counsellors should keep in mind
that, even within a particular occupation, there can be large
differences between people in terms of personality traits. To assume
that there is one particular configuration of personality traits for any
particular occupation is an oversimplification. The fifth edition of the
16PF (16PF5) has been adapted for the South African population with
norms available for students and working adults (Institute for
Personality and Ability Testing, 2006, 2009). As discussed in Chapter
12, research has shown that this instrument works well across
different cultural and language groups in South Africa.
The Myers-Briggs Type Indicator (MBTI; Myers & McCaulley,
1985) aims to categorise people into one of 16 different personality
types based on their scores for four bipolar scales. It is assumed that
people who belong to the same personality type share the same
personality characteristics, and that they differ from individuals who
belong to another type in important ways (McCaulley, 2000). A further
assumption is that different personality types are differentially suited
for different occupations. The task of the career counsellor and the
client becomes one of exploring the client’s personality type and
identifying environments that correspond with this type. Research
results support the psychometric validity of the MBTI in the South
African context (Van Zyl & Taylor, 2012). The MBTI Career Report is a
useful tool as part of a career counselling process.
The Basic Traits Inventory (BTI; Taylor & De Bruin, 2006) was
developed to assess the Big Five personality factors in the South
African population. The five factors that are measured are:
Neuroticism, Extroversion, Openness to experience, Agreeableness
and Conscientiousness. The construct validity of the five-factor model
has received empirical support in the South African context (Vogt &
Laher, 2008, 2011; Taylor & De Bruin, 2006). The BTI has norms
available for the general South African population as well as for
students.

13.2.5 Assessing values


Zunker (2016) emphasises the importance of value clarification in
career-counselling assessment. Values may be considered as
important motivators of behaviour, because people strive to achieve or
obtain the things that they value and to move away from the things
that they do not value.
The Values Scale (Langley, Du Toit & Herbst, 1992a) may be used
to assess the relative value that an individual places on activities. The
scale measures 22 different values that can be grouped into six broad
clusters: Inner orientation, Material orientation, Autonomous lifestyle,
Humanism and religion, Social orientation, and Physical orientation
(Langley, 1995). Carvalho (2005) found support for all these broad
clusters in a group of South African university students, except for
Material orientation. The Values Scale is standardised for English,
Afrikaans, and African language groups in South Africa.

13.2.6 Evaluation of the person-environment fit


approach to career counselling assessment
The person-environment fit approach rests on the assumption that,
if an individual knows himself or herself and the world of work, the
individual should be able to make a satisfying career choice by
selecting a work environment that matches his or her abilities,
interests, and personality (Sharf, 2013). However, this approach does
not recognise that individuals may be differentially ready to make
career choices. Some individuals can benefit from a self-exploration
exercise and can readily integrate information about the world of work,
while others are not mature enough (in a career decision-making
sense) to usefully implement such information. The person-
environment fit approach also assumes that people are free to choose
from a variety of available career options. However, in many
developing countries such as South Africa, job opportunities are
scarce and few individuals have the privilege to choose a career from
the available options. The majority do not have a choice in this regard
and may be forced to take any career opportunities that may come
their way.
To address some of the shortcomings of this approach, more
recent career-counselling approaches have begun to incorporate
career identity and adaptability in the process, rather than focusing
solely on person-environment fit (Savickas et al., 2009; Watson & Kuit,
2007). Research has also found that factors beyond person-
environment fit play a role in career-related behaviours (Hirschi &
Läge, 2007; Tracey, 2007). Therefore, taking several psychological
measures to identify salient abilities, interests, and personality
characteristics may be of limited value for many individuals. Lastly,
assessment from a person-environment-fit perspective can only be
useful if the psychological tests and questionnaires used in the
assessment provide reliable and valid scores. There is a growing
concern about the appropriateness of assessment results for diverse
groups that did not form part of the standardisation and validation of
the assessment measure (Zunker, 2016). Due to the differences in
language, socio-economic status, beliefs, and educational
background, it cannot be taken for granted that measures developed
for one group can be used with other groups. In this regard,
psychometricians should:
demonstrate that the measures tap the same characteristics for
different groups
develop new measures that measure the same characteristics for the
different groups (see Chapter 7).
Despite the above criticisms, the person-environment fit approach has
been a dominant force in career counselling and several important
psychometric instruments have been developed specifically for career-
counselling purposes.

13.3The developmental approach


to career counselling assessment
During the 1950s, Donald Super introduced the concept of career
maturity and, by doing so, placed career decision-making within the
context of human development (Crites, 1981). From a developmental
perspective, career choice is not viewed as a static event but, rather,
as a developmental process that starts in childhood and continues
through adulthood. According to Super (1983), career maturity refers
to the degree to which an individual has mastered the career-
development tasks that he or she should master in his or her particular
developmental stage. By assessing an individual’s level of career
maturity, the career counsellor may identify certain career-
development tasks with which the individual has not dealt
successfully. These tasks may then become the focus of further
counselling.

13.3.1 Career assessment in the developmental approach


Super’s (1990) model for developmental career-counselling
assessment consists of four stages: preview, depth-view, assessment
of all the data, and counselling. During the first stage, the career
counsellor reviews the client’s records and background information.
Based on this review, as well as a preliminary interview with the client,
the counsellor formulates a plan for assessment. In the second stage,
the counsellor assesses the client’s work values, the relative
importance of different life roles, career maturity, abilities, personality,
and interests. During the third stage, the client and career counsellor
integrate all the information to understand the client’s position in terms
of the career decision-making process. The final stage involves
counselling with the aim of addressing the career-related needs
identified during the assessment process.
The assessment of interests, values, personality, and abilities were
already discussed in the previous section. These form typically part of
a development approach to career counselling. Measures for the
assessment of career maturity and role saliency, respectively, are
particularly relevant to Super’s career-development theory.
Langley (1989) identified five components of career maturity: self-
knowledge; decision-making; career information; the integration of
self-knowledge with career information; and career planning.
The Career Development Questionnaire (CDQ; Langley, Du Toit, &
Herbst, 1992b) may be used to assess the career maturity of high
school and university students. Research has found evidence that the
CDQ is reliable and valid in the South African context (De Bruin &
Bernard-Phera, 2002; Pieterse, 2005).
With regards to role saliency, Super (1983, 1994) emphasised
that the role of ‘worker’ is not equally important to everybody.
Assessing work-role salience should therefore form an important part
of the career-counselling process. Super distinguishes between five
major arenas in which an individual’s life roles are played:
workplace
community
family
academic environment
• leisure.
The Life Role Inventory (LRI; Langley, 1993) was designed to
measure the relative importance of each of these five roles. It was
standardised for white and black South African high school students.
A factor analysis of the LRI by Foxcroft, Watson, De Jager, and
Dobson (1994) indicated that it is not possible to distinguish
psychometrically between the roles of worker and student for South
African university students.
The Career Development Questionnaire and Life Role Inventory
are unique to the developmental approach to career-counselling
assessment. However, one can question the continued use of these
measures, given the fact that they were developed more than two
decades ago and therefore the impact of the changes in the world of
work is not considered.
13.3.2 Evaluation of the developmental approach to
career counselling assessment
The validity of the developmental approach to career counselling in
the South African context has been questioned by several authors (De
Bruin & Nel, 1996; Naicker, 1994; Stead & Watson, 1998). The
developmental approach adds to the person-environment fit approach
in that it emphasises the developmental nature of career decision-
making. The developmental approach also emphasises the need to
focus on an individual’s specific career-counselling needs. However,
this approach makes certain assumptions that are of questionable
value in the context of (a) a developing country characterised by
poverty, unemployment, and crime; and (b) the world of work, which is
characterised by change, uncertainty, and volatility. These
assumptions are as follows:
career development follows a predictable path with clearly defined
stages
• the developmental tasks associated with every stage have to be
successfully completed before the client can move on to a next
stage. If a person is career mature he or she will be able to make a
successful career choice.
These assumptions do not hold in developing countries where
individuals often do not have the financial resources to complete their
schooling, where schooling is often disrupted by social and political
unrest, and where job opportunities are scarce. In such countries,
individuals are often forced to find any employment that may be
available, without considering how well the characteristics of the
working environment can be integrated with their interests, values,
and abilities. Also, some individuals may be highly career mature and
may still find it very difficult to find any employment

The systems approach to


13.4
career counselling assessment
In an attempt to integrate the many existing theories and concepts
related to career development into a single framework, career
theorists and practitioners turned to the systems theory framework
(STF; Patton & McMahon, 1999). According to Patton, McMahon and
Watson (2017, p. 89), the STF can provide ‘the basis for an
overarching or metatheoretical framework within which all concepts of
career development described in the plethora of career theories can
be usefully positioned and utilised in theory and practice’. In addition,
the STF denotes ‘the complex interplay of influences through which
individuals construct their careers’ (p. 89). Constructivism has
informed the development of the STF (McMahon & Watson, 2008),
resulting in a collaborative counselling process with the counsellor and
client as more equal partners, specifically focusing on the client’s
construction of his or her own meaning of career.
The STF represents both content and process influences. Both
these influences are set within a time system of past, present, and
future. The past influences the present and both the past and present
have an influence on the future. Content influences refer to the
intrinsic personal characteristics and the effect of the context in which
the individual lives. These subsystems are interconnected and include
the individual, other people, organisations, the society, and
environment (McMahon, 2011; McMahon & Watson, 2008). The
individual system includes intrapersonal influences such as age,
gender, interests, personality, and abilities. The individual system
forms part of a larger contextual system – accommodating the social
system and the environmental-societal system. The social system
refers to other people systems relevant to the individual, for example
family, friends, and educational institutions. The individual system and
the social system are nested in the environmental-societal system
(including politics, technological developments, and globalisation),
which may have a significant influence on the individual’s career
development. These systems can change and are constantly
interacting with one another – implicating a process of dynamic, open
systems.
Process influences include three aspects: recursiveness, change
over time, and chance. Recursiveness refers to the interaction of the
various influences. Changes in one system will have an influence on,
and may lead to changes in, another subsystem. Secondly, the nature
and degree of influence change as time goes by. With reference to
change, the career development of individuals can no longer be
planned, logical, and predictable. Chance events, including
unexpected happenings such as illness, downsizing, and natural
disasters, may also hugely impact the individual’s career development
(McMahon, 2011; McMahon & Watson, 2008).

13.4.1 Career assessment in the systems approach


The systems approach to career counselling provides an excellent
framework for incorporating all the other career-counselling
approaches discussed. Although its main focus is on qualitative
processes, quantitative measures could effectively be used to support
the counsellor in obtaining information about the content influences
(for example abilities, aptitudes, personality, interests, and values). An
important goal of career assessment within the STF is to identify
‘connections between client’s experiences and various elements from
their system of influences … including the past, present, and future’
(McMahon, Patton & Watson, 2003, p.195).
McMahon et al. (2003) provide suggestions to guide the
development of qualitative career-assessment measures. These
include the need to ground the assessment process in theory, the
testing of the process, completing the process in a realistic time frame,
understandable instructions for the client, providing for a flexible
process, cooperation between the counsellor and client, and a
debriefing process. These suggestions were adhered to in the
development of the My System of Career Influences reflection activity
(MSCI; McMahon, Watson & Patton, 2005). The MSCI consists of a
booklet in which the individual is required to reflect on his or her
present career situation, including aspirations, work experience, life
roles, previous decision-making, and support networks (Patton, et al.,
2017). In the next steps, the individual identifies and prioritises the
systems that influence him or her, for example the self, others,
society, and the environment. Thereafter, he or she reflects on past
career influences, present circumstances, and anticipated future
lifestyle. According to McMahon and Watson (2008, p. 280), the
process helps individuals ‘to explore the systemic influences on their
careers and, in so doing, to tell their career stories’. Finally, they
summarise these reflections to construct their own unique MSCI
diagrams. The MSCI is available in two different versions for adults
and adolescents, respectively, and can be used with both individuals
and groups. Both versions are particularly useful across cultures. The
assessment actively involves the client in the process and provides an
avenue along which the meaning that clients attach to their career
development can be explored (McMahon et al., 2003).

13.4.2 Evaluation of the systems approach to


career counselling assessment
Constructivism in career theory and counselling has influenced career
counsellors to increasingly adopt qualitative methods of career
assessment in addition to, or as an alternative to, standardised
psychological testing. The systems approach to career counselling
assessment has resulted in the STF. Although traditional career
theories have paid much attention to intrapersonal variables such as
interests, personality, and values, important other intrapersonal
influences received little attention (Patton et al., 2017). These include
sexual orientation, gender, ethnicity, and disability. The systems
theory approach therefore provides the individual the opportunity to
consider the impact of a broader range of intrapersonal variables on
their career development. Individuals cannot be separated from their
environments – their behaviour should be understood within the
context that it occurs. The systems approach places a strong
emphasis on the impact of social and environmental influences by
considering the larger contextual system of the individual.
The MSCI adds to the growing number of qualitative assessment
tools and is a result of stringent theoretical, conceptual, and practical
refinements over many years (Patton et al., 2017). The approach also
provides for the creativity of the counsellor to develop his or her own
assessment techniques and processes, particularly in combination
with more traditional approaches to career counselling.

13.5 Career construction theory and


life-design counselling
The constructivist approach in career counselling has also led to
career construction theory (CCT) and life-design counselling
(Savickas, 2005). CCT is based on the differential approach (person-
environment approach), the developmental approach, and the
psychodynamic approach to career counselling and assessment
(Hartung, 2007). The theory focuses on life structure, career decision-
making, career-adaptability strategies, thematic life stories, and
personality styles in career counselling (Savickas, 2005). Clients and
counsellors collaborate to construct clients’ careers by ‘imposing
meaning on their vocational behavior and occupational experiences’
(p. 43). Furthermore, CCT provides a style of intervention that can
help clients deal with global workplace challenges and career-life
transitions (Maree, 2017).
Life-design counselling (Savickas et al., 2009) is an offshoot of
CCT (Savickas, 2005) and includes the theory of self-construction
(Guichard, 2005). The life-design counselling model has a lifelong,
holistic, and contextual focus. This approach focuses on a career as
one part of a person’s life and explores clients’ adaptability,
narratability, activity, and intentionality in the counselling process
(Savickas et al., 2009). In this regard, Savickas et al. (2009) argue
that because life consists of a series of interconnections between life
domains, it is no longer possible to focus only on career development
and person-environment fit in career counselling and assessment.
The life-design counselling model broadly consists of three
phases: encouraging clients to tell small stories about their lives;
reconstructing these stories into larger stories; and co-constructing
future stories. Maree (2013) summarises the general steps of the life-
design counselling model. These include: (1) co-defining the aim of
the intervention through a working alliance between the counsellor
and client; (2) guiding the client towards exploring how he or she sees
and organises him- or herself; (3) revising the client’s story to make it
more objective and broaden perspectives to reorganise the story; (4)
integrating the client’s problems and challenges with the revised story
that may lead to a ‘new’ role and identity; (5) identifying activities and
developing an action plan to actualise the new identities; and (6)
establishing short- and long-term follow-ups.

13.5.1 Assessment in career construction theory


and life-design counselling
CCT and life-design counselling mostly use qualitative approaches to
career assessment. However, Maree (2013, p. 412) notes that these
counsellors ‘…are as interested in interpreting the subjective aspects
of career counselling … as they are in interpreting clients’ “objective”
test results.’ Quantitative and qualitative approaches are therefore
often combined to obtain data that can help clients to make well-
informed career-choice decisions (Maree, 2017).
The Career Construction Interview, formerly known as the Career
Style Inventory (Savickas, 1989), is a qualitative assessment method
that can be used ‘as a method for attaining a more comprehensive
and personally meaningful representation of the self’ (Taber, Hartung,
Briddick, Briddick & Rehfuss, 2011, p. 274). It consists of a series of
questions that elicit information on the client’s life structure, career
adaptability strategies, life themes, and personality style (Hartung,
2007). The questions allow clients to tell small stories about
themselves that show who they are and who they wish to become.
The career counsellor listens closely to these stories, asks follow-up
questions, and uses reflective techniques to clarify meaning. The
counsellor and client then use the responses to ‘co-construct a life-
career portrait’ (Hartung, 2015, p. 116). The counsellor and client
therefore collaboratively ‘shape the themes culled from these micro-
stories into a macro-narrative about the person’s central
preoccupation, motives, goals, adaptive strategies, and self-view’
(Hartung, 2015, p. 115).
The Career Interest Profile (CIP; Maree, 2007) is another
qualitative tool that provides the counsellor with objective and
subjective data that can be discussed with the client during
counselling (Maree, 2010b). It combines various methods of
assessing a client’s career interest profile, including formal
questionnaires, interviews, observations, and informal questionnaires.
The combination of these methods helps the client to narrate his or
her career story, ‘thus enabling them to listen to, authorise, and fine-
tune these stories, and, ultimately, become actively involved in the
process of self- and career construction (Di Fabio & Maree, 2013, p.
42). The CIP can be administered individually or in groups.
Career counsellors who employ a qualitative approach to career
assessment rely on numerous other techniques, including genograms,
card sorts, lifelines, storyboarding, early recollections, and creative
writing. McMahon and Watson (2015) provide a comprehensive
review of these and other qualitative techniques that could be useful in
terms of career assessment and counselling.

13.5.2 Evaluation of career construction theory and


life-design counselling
CCT and life-design counselling have emerged out of a postmodern
approach to career counselling (Savickas et al., 2009). The world of
work and career counselling has experienced significant changes in
recent years (Maree, 2010a; Morgan, 2010). In this regard Savickas
(2010) argues that people are facing increased career uncertainty and
life-long employment is becoming less common. Assignments have
replaced jobs and networks have replaced organisations. Navigation
through the world of work has therefore become increasingly more
difficult and as a result people need to start viewing a career as part of
their life as they form their life story (Savickas, 2011). The strength of
CCT and life-design counselling is that they are idiographic in nature
and therefore focus on the individual him- or herself rather than on test
scores. This allows the client to be the expert of his or her life rather
than the counsellor acting as expert (Maree, 2010a; Savickas, 2005).
In line with this, the qualitative assessment methods used within CCT
and life-design counselling provide clients with the opportunity to be
actively involved in the assessment process (De Bruin & De Bruin,
2017). These approaches can be applied successfully in developed
and developing contexts, as well as with individuals and groups
(Maree, 2017).

❱❱ CRITICAL THINKING CHALLENGE 13.1

James Mhlangu is 18 years old and in Grade 12. He comes from the rural areas
of KwaZulu-Natal. He was awarded a scholarship to study engineering at any
South African higher-education institution. Although James is very excited about
this opportunity, his parents are not convinced that this is the right career path for
James. The three of them approach you for help with this important career
decision. What career counselling and assessment approach would you follow,
and which measures would you use to assist James to make this difficult
decision? Why would you use the specific assessment tools that you choose?

Career counselling in a
13.6
changing environment
We described two traditional approaches as well as two emerging
approaches to career counselling assessment in this chapter. Each
one of these approaches has a different emphasis. The person-
environment fit approach emphasises the fit between a person and his
or her environment. The developmental approach adds to this by
emphasising the readiness of an individual to make a career choice.
The systems approach provides for the influence of various
subsystems on the individual’s career development. Career
construction and life-design counselling explore how people fit their
career into their life stories and how they adapt to changes in their
careers.
All four approaches can be valuable in the career-counselling
context, depending on the particular needs of a client. An emphasis on
one of these at the expense of another may be seen as limiting in the
career-assessment process. The main purpose of career assessment
is to enhance self-awareness. Traditional tools such as personality
and interest inventories may serve this purpose very well. In keeping
with a constructivist and narrative approach, the information gained
from the inventories is not taken as absolute and fixed, but as
information that must be given meaning within the broader system of
influences and life themes of a client. Career counsellors are
encouraged to further explore these and other approaches; in that
way, they will be able to tailor their assessment and interventions
according to what is best for the client, and not according to that with
which they are most comfortable and familiar.
Career counsellors are also encouraged to recognise the
pervasive social and technological changes that are influencing the
way in which work is structured and perceived. Individuals no longer
enter organisations and devote their entire career to that organisation.
Often, they do not only change from one organisation to another, but
to whole new fields of work during their career. More and more so,
workers have to be multiskilled and flexible. Career-counselling
assessment can be valuable in this regard by helping individuals
recognise their work-related strengths, skills, and abilities that can be
applied in many different contexts. Values and interests also remain
important, because workers continuously seek to find working
environments that satisfy their psychological needs.

13.7 Conclusion
In this chapter you have been introduced to different approaches to
career counselling, the changing role of assessment in career
counselling, and a range of career-counselling assessment measures.

CHECK YOUR PROGRESS 13.1


13.1 What are the basic features of the person-environment fit approach?
13.2 Which intelligence measures are often used in the career-
counselling situation?
13.3 Discuss the SDS, SACII, MB-10 and MCM as interest inventories.
13.4 When will you use the CDQ and the LRI as assessment measures?
13.5 Which subsystems are accommodated in the systems theory framework and
how can the influence of these on the career development of
the individual be explored?
13.6 Which techniques can be used in qualitative approaches to
career counselling?
13.7 Evaluate the person-environment fit, developmental, systems,
career construction theory and life-design counselling
approaches to career counselling in a continuously changing
work environment.

❱❱ ETHICAL DILEMMA CASE STUDY 13.1

Mrs Nkanyana makes an appointment with Dr Jansen, a psychologist, to do a career


assessment with her 16-year-old daughter, Nomvuyo. She mentions that the school
requested such an assessment to guide Nomvuyo into the most appropriate career
direction. More specifically, the school wants to have a full report on Nomvuyo’s
abilities, interests, and personality. Dr Jansen knows that career counselling is not
his area of expertise, but he does not want to turn away this client, especially
because he expects many possible referrals from the school if he provides a
satisfactory service to the client and the school. He consults with a colleague, Ms
Mosia, regarding the career-counselling process and she suggests that he can use
her assessment measures. All these measures have norms only for first-year
university students and most of them were normed before 1996.

Questions
(a) At what point should Dr Jansen decide whether he will take on
this client? If he does continue with the career-counselling
assessment, what ethical risks is he facing?
(b) If Dr Jansen decides to refer the client to his colleague, Ms Mosia, who is
an experienced career counsellor, what challenges might she face in
terms of her available assessment measures? What alternative process
would you suggest to provide appropriate career counselling to the client?

ADVANCED READING
Gysbers, N.C., Heppner, M,J. & Johnston, J.A. 2014. Career counseling: Holism,
diversity and strengths 4th ed. Alexandria, VA: American
Counseling Association.
Heppner, M.J. & Jung, A. 2012. When the music changes, so does the
dance: With shifting U.S. demographics, how do career centers
need to change? Asian Journal of Counselling, 19(1&2), pp. 27–51.
Osborn, D.S. & Zunker, V.G. 2016. Using assessment results for
career development 9th ed. Boston, MA: Cengage Learning.
McMahon, M. & Watson, M. Eds. 2015. Career assessment: Qualitative approaches.
Rotterdam: Sense.
Prince, J.P. & Heiser, L.J. 2000. Essentials of career interest assessment. New York:
John Wiley.
Seligman, L. 1994. Developmental career counseling and
assessment. Thousand Oaks, CA: Sage.
Stead, G.B. & Watson, M.B. Eds. 2017. Career psychology in the
South African context 3rd ed. Pretoria: Van Schaik.
Computer-based and Internet-
delivered assessment
NANETTE TREDOUX
Chapter CHAPTER OUTCOMES
14 In this chapter we will outline the current state of development in computer-
based assessment, considering the advantages and potential disadvantages of
computer-based and Internet testing as well as the ethical and practical issues
that the practitioner should bear in mind when using such assessment
technology, with the consideration of guidelines for good practice and
decision-making.
By the end of this chapter you will be able to:
› understand the features of different types of technology-supported
assessment and critically evaluate new developments in computer-based
and Internet-delivered assessment systems
› critically assess the effect of different computerised or Internet-delivered test
administration systems on the standardisation and fairness of the
assessment process, and use computer-generated assessment reports in a
professionally responsible and discerning manner
› understand your legal, ethical, and professional responsibilities when using
computer-based and Internet-delivered assessment systems, and be aware
of best-practice guidelines regarding the use of computer-based and
Internet-delivered assessments.

14.1 Introduction
The use of computer technology has had a profound effect on
assessment practices. Initially, these changes were seen as
revolutionary but they have now become so commonplace that
traditional pencil-and-paper assessments are now the exception,
rather than the default, for testing in occupational and many
educational, counselling, and clinical settings.
While computer technology contributes to efficiency in test
construction and assessment practice and feedback (Gregory, 2010),
and even makes innovative new means of assessments possible, it
also presents technical and ethical challenges to the practitioner.
Technological advances are proceeding so rapidly that they threaten
to outpace attempts to regulate technology-supported assessment
practices. Practitioners need to make an effort to keep up to date with
technical developments and also exercise critical judgment regarding
the degree to which they choose to make use of new technologies.
Even so, they will often need to make decisions about new technology
and measures that they do not fully understand.

14.2 Some terms explained


Before we go any further, we need to define some of the important
terms that you will find in this chapter.
Computer-based test administration means that a computer
system presents the instructions and the items, times the test or even
the individual items if necessary, and scores the test. Computer-
based test scoring is possible and has been done for many years,
even for tests that have been administered using pencil and paper –
the responses can be entered into the computer manually, or
mechanically, for instance by using an optical mark reader.
Computerised adaptive testing relies on mathematical formulae
to select items for a particular respondent (test-taker) based on their
previous answers. Thus, every respondent completes a different set of
items appropriate for them, and the estimate of a person’s ability is
independent of the particular set of items that have been
administered. This approach results in more efficient tests and greater
security over the test items, and also minimises cheating.
Respondents do not have to struggle with many items that are too
difficult for them, or waste time with items that are too easy. The
system stops when the construct has been measured reliably.
Adaptive tests are usually based on Item Response Theory (IRT)
principles (Embretson & Reise, 2000). The Rasch model is essentially
the same as the single-parameter IRT model, and is also sometimes
employed. Adaptive tests are claimed to provide more precise
measurement of an individual’s capabilities, since the difficulty of the
test is matched to the candidate’s ability level (Hambleton,
Swaminathan & Rogers, 1991). IRT-based and Rasch-scaled tests
are theoretically ‘distribution free’, and the measurement has the
properties of an interval scale (Bond & Fox, 2007). There is, strictly
speaking, no need to use norms for such tests. Thus, they are useful
in international assessment projects where the use of local norms
would be confusing when the scores of people from different
backgrounds must be compared directly.
For practitioners who have been trained in classical psychometric
theory and who are used to interpreting test scores based on norms,
tests based on Item Response Theory (IRT) may come across as a
radical departure. Because understanding this approach to testing has
become a part of the necessary skill set for practitioners using
Internet-delivered testing, it is recommended that practitioners should
undergo the necessary training for continuing professional
development purposes, to enable them to make responsible choices
and decisions about such measures.
Internet-delivered testing is a type of computer-based
assessment that occurs when the device on which the test is
administered is linked to the computer controlling the test
administration via the Internet, rather than being on the same
computer or linked through a local area network. Internet-delivered
testing can potentially be delivered on mobile devices such as smart
phones and tablets − such devices now have more computing power
than early personal computers did. This makes it possible for
practitioners to test respondents remotely anywhere in the world.
Vendor-neutral assessment portals are Internet-based delivery
systems that enable users to combine tests from different test
publishers in one battery (usually used for remote, unsupervised
administration).
Computer-based test interpretation means that the computer
system converts the scores on a test or a battery of tests into a report.
Computer-generated reports can be as simple as a graph or table of
standard scores, or they can relate a test-taker’s test performance to
the requirements of a job, reporting on individual competencies.
Some systems can report on groups of respondents rather than
individuals. Reports can be designed as feedback for psychology
professionals, line managers, individuals or other stakeholder groups,
with the intention of using appropriate terminology and illustrations to
be understood by the target groups. Reports can be designed for
specific purposes such as development, team building or selection, or
to support career decision-making. Computer-based reporting
systems rely heavily on algorithms (mathematical formulae) that
attempt to model the judgment of a skilled test interpreter, and may be
derived from statistical calculations. The reporting systems can be part
of the test administration system and be developed by the test
publisher, or they can be incorporated in a separate technology
platform called a third-party integrated reporting system, which
may be attached to an assessment portal as described above. Note
that these third-party systems are developed independently of the test
developers and distributors. Some of these systems even enable test
users to set up their own algorithms. It is the test user’s responsibility
to ensure that the use of such systems does not result in distorted or
unfair interpretations of test results.
Computerised simulation assessments are a special type of
computer-based assessment modelling a real-life task. For instance,
the respondent may be required to respond to questions, handle
correspondence, make business decisions, and complete a task by a
given deadline. With graphics and multimedia, the experience can be
made very realistic. Gamified assessments are a form of simulation
assessment designed to be more like a computer game, with high
levels of realism and possibly even including virtual reality (VR) or
augmented reality. These technologies require special equipment
and make it possible to immerse the participant in the experience with
three-dimensional graphics that appear to surround him or her, and
with which the respondent can interact. Such simulations differ widely
in terms of the nature of the data collected and how an ‘item’ is
defined. Calculating the metric properties of such tests is not
straightforward, and it requires different ways of thinking about
reliability, generalisability, comparability, and validity (Mislevy, Oranje,
Bauer, Von Davier, Hao, Corrigan, Hoffman, DiCerbo & John, 2014).
Artificial intelligence (AI) refers to the use of computers to
perform tasks that would normally require human intelligence. In the
context of psychological assessment this can include speech
recognition and speech synthesis, translation between languages, and
decision-making. A primitive form of artificial intelligence is employed
in programmes that produce computer-generated reports, but this field
of computer science has the potential to have a much greater impact
on the field of assessment.

14.3Computer-based and Internet-


delivered assessments in South Africa
14.3.1 Early South African developments
Pioneering work was done on computerised assessment in South
Africa under the auspices of the Council for Scientific and Industrial
Research and the Human Sciences Research Council (HSRC). At the
National Institute for Personnel Research (NIPR), the PLATO-based
testing system was developed in the late 1970s. This system ran on a
mainframe-based wide area network and predated the Internet.
Testing systems using personal computers, such as the PsiTest,
Siegmund, and CATS systems, were developed at the HSRC in the
late 1980s and early 1990s. These systems incorporated a wide range
of tests that were used in South Africa at the time, and supported test
administration, scoring, norming, and basic reporting. The PLATO-
based system and PsiTest supported the management of testing
projects over wide area networks. Testing was supervised, although
the people overseeing the testing process were often not registered
psychology professionals. At the time, psychologists were allowed to
delegate ‘psychological acts’ to unregistered persons. These
computerised tests were mostly based on classical psychometric
theory and were not adaptive, although some computerised job-
simulations were developed and even some adaptive ability tests
based on Item Response Theory (IRT).

14.3.2 Technical limitations of early testing systems


Early personal computer systems were not very reliable and system
interruptions were common. The power supply and network
connectivity were also erratic. To function in such an unstable
environment, these early systems had to be robust and able to restart
after interruptions with the data intact. They also used hardware-
based security locks, called ‘dongles’, as well as passwords, to protect
the systems from unauthorised access.

14.3.3 A wider choice of tests becomes available


During the 1990s and early 2000s, several foreign test publishers
actively started marketing computer-supported testing in South Africa,
and a wider range of tests became available. Practitioners were now
faced with attractive assessment-technology options, some of which
had not been adapted for use in South Africa. During the same period,
some of the tests that were developed and distributed by the HSRC
were criticised as being biased and unfair to disadvantaged groups
(Taylor & Boeyens, 1991; Abrahams & Mauer, 1999; Abrahams,
2002), and test users were encouraged to take some responsibility to
ensure that the measures they used could be used fairly (Taylor,
1987).
Computer-generated interpreted reports from both local and
foreign suppliers became available. These reports varied in
sophistication, with the most sophisticated being able to relate
measured constructs to work competencies and report on groups of
people at a time.

14.4Computer-based testing in
the twenty-first century
14.4.1 A wider variety of tests
Since the dawn of the twenty-first century, there has been a huge
increase in the number of test providers offering Internet-delivered
assessment technology in South Africa. Test users have many new
tests, both foreign-developed and local, at their disposal. In some
cases, the track records of the tests and the test providers are not
known to the person contemplating the use of the measures. For
instance, a practitioner may find a new test using a Google search,
which may seem ideal for the intended use, but the test is not
represented by any distributor in South Africa and the Web site is
hosted in another country. Such a test would not have been tested on
or adapted for South Africans, and may even be available to persons
who are not qualified to use psychological tests in South Africa.

14.4.2 Internet-delivered testing and cloud storage


Assessment practices are moving away from standalone computers
and local area networks towards the Internet, and Internet-delivered
assessment is becoming the rule rather than the exception in most
occupational and many educational assessments. This enables tests
and testing systems to be updated more quickly and easily because
there is no need to distribute and install physical media – the updates
are done on the server computer and are instantly available all over
the world.
Internet-delivered testing presents the risk that test items can
become overexposed, which can compromise the integrity of the test.
Adaptive tests are therefore preferable for Internet-delivered testing,
especially for ability tests. However, adaptive testing is not limited to
educational and occupational assessment, and is also being
employed for screening for psychological disorders (Donadello, Spoto,
Badaloni, Granziol & Vidotto, 2017; Finkelman, Lowe, Kim, Gruebner,
Smits & Galea, 2017; Flens, Smits, Carlier, Van Hemert & De Beurs,
2016).
Mobile devices are becoming more common for Internet-delivered
testing. Tablets and smartphones are more affordable and ubiquitous
than desktop and laptop computers, and making assessments
available on such devices can avoid excluding respondents who
cannot afford the more expensive technology. However, there are
many factors that affect the bias and fairness of tests when used on
such devices. In particular, the smaller screen sizes and limited
processing power of tablets and smartphones affect which types of
test items work well and are perceived as fair (Chernyshenko & Stark,
2016). In the early days of computerised testing, improved
standardisation of the testing process was considered one of the
advantages, but now with a wide variety of devices that differ in
specification being used in different and uncontrolled situations,
standardisation is once again something that needs to be explicitly
considered.
Test results are also increasingly stored in different locations from
where the assessment takes place. Sometimes the test results are
‘mirrored’, which means they are simultaneously backed up to
different geographical locations. While this provides protection against
data being wiped out in a natural disaster, it raises questions
regarding the control the test administrator has over the protection of
personal information. South African assessment practitioners need to
ensure that the privacy and integrity of data that are stored using
online assessments are protected (Department of Justice, 2013).

14.4.3 Changes in the size and scope of assessment projects


In large-scale testing projects in occupational and educational
contexts, pencil-and-paper testing has become a rare exception rather
than the norm. Supervised or proctored testing has become much less
commonplace, with many assessments being conducted
unsupervised via the Internet. The respondent activates the testing
session, usually by clicking through on a link in an e-mail message.
Some assessment projects in corporate organisations have become
multinational and target very large numbers of respondents in multiple
countries.
Different regulatory frameworks may apply in the various countries
where respondents are completing the tests. Diverse cultural and
language factors may also apply in different countries where
respondents are being assessed for the same role. Because of the
number of respondents involved with some of these projects, it can be
very difficult for the assessment practitioner to even be aware of all
the complicating factors in such a project. Often these projects are run
by a very small number of psychology professionals. The role of the
professional assessment practitioner has indeed become endangered
by automation. The few practitioners that administer these massive
assessment projects carry a heavy ethical responsibility to ensure that
respondents are fairly treated and that their rights are not violated.
14.4.4 Changes in reporting and interpretation
The writing of integrated reports on test results used to be an
important part of the training of assessment professionals, and it was
a time-consuming task. This is changing fast. Test users have now
come to expect sophisticated computer-generated reports, tailored to
the purpose of the assessment and attractively presented in a format
and style suitable for the intended recipient. Computer-generated
reports on groups of people rather than just individuals are becoming
commonplace, as are reports that compare a candidate’s measured
profile to the requirements of a role.
Combining several measured dimensions to calculate a single
score, or ‘fit index’, that is supposed to indicate how well a respondent
matches the requirements of a role is also a common feature of
computer-generated reports. In extreme cases (which are not that
uncommon), candidates can be placed in rank order in terms of such
a fit index. The algorithms used to calculate these fit indices are not
always open to scrutiny and South African assessment practitioners
are not necessarily trained to evaluate them critically. The complexity
is usually hidden because the practitioner only sees the final result of
the calculation. Thus, decisions can be made on the basis of scores
where the measurement error is not known because different scales
have been averaged in some way. It is common in large projects for
large numbers of respondents to have an almost identical fit index,
and this can mean that employment decisions are made on the basis
of tiny differences between scores that have been calculated rather
than measured directly. With the advent of integrated reporting
systems that allow practitioners to set up the relationships between
measured dimensions and job requirements themselves, assessment
practitioners now assume additional responsibility.

14.4.5 New types of assessments


One of the advantages of computerised testing is that the tests no
longer need to be limited to static, text-based multiple-choice items.
Test items can incorporate video material at various levels of realism.
Situational judgment tests (see Chapter 15), which were formerly
presented as text-only narratives, can now be done with video and
sound. Moreover, the test itself can be more responsive to the test-
taker’s answers, to make the whole experience more dynamic and
immersive. The constructs being assessed can be more complex – for
instance, formal and informal reasoning can be assessed interactively
using computer-based assessments (Teig & Scherer, 2016). The
improved realism can introduce cultural content such as the gender,
races, languages, and accents of video characters. Because
respondents may react differently to this content, it can introduce
cultural bias if it is not relevant to the construct that is being assessed
(Popp, Tuzinsky & Fetzer, 2016).
Assessments can take the form of work simulations or computer
games. Considering ‘serious games’ that can be designed for
assessment or training is becoming a research area in its own right
(Serious Games Institute, 2017), and is also receiving research
attention in South Africa at the University of the North West (Msolo,
2017). (See also Chapter 15, Section 15.2.1.4.)
Virtual content is not only becoming prevalent in assessment, but
in many learning environments where simulations are an essential
element. Behaviour must be captured at a sufficient level of detail so
that cognitive and affective states can be estimated. This involves
complicated models and decision rules (Khan, 2017).
The test-taker can complete the simulation as an individual or
several test-takers can participate at the same time. While this can
improve face validity and even content validity, such games and
simulations yield a lot of information, and it is difficult for game
designers to decide what the critical behaviours should be that would
be linked to the constructs being measured, and how they should be
weighted (Mislevy, et al., 2014). Gamified assessments are often
developed for a particular client or job situation, and the consultants
doing the development are not usually psychology professionals. Such
a development team may consist of coders, interface designers, game
designers, artists, and so forth. They rely on the client to specify what
the assessment exercise should do. The psychology practitioner
should also consider the information that ‘subject matter experts’ –
people who know the requirements of the role for which the
assessment will be done – can contribute. The process of designing a
computerised simulation or assessment game brings us back to the
issue of algorithms – the assessment practitioner who commissions
the development of a computerised simulation or gamified
assessment may be required to approve the mapping of game
behaviours to constructs and the formulae that are employed to
calculate final scores. This is probably something that falls outside the
training and experience of a psychology practitioner, and yet we shall
be expected to assume professional responsibility for the fairness,
validity, and lack of bias of these assessments. Moreover, the
development of such gamified exercises is currently an expensive
undertaking that does not usually allow for much scope to fix mistakes
that can occur.
According to Vrana and Vrana (2017), it will soon be possible to
reliably administer and score intelligence tests that are normally
administered individually using computers without any human
involvement, as opposed to the computer-assisted technology
currently offered by the Q-Interactive platform (Vrana & Vrana, 2017).
The authors believe that the developers of this platform specifically
designed the system to still require more involvement from the test
administrator than is strictly necessary, given the capabilities of the
tablet computers that are used, and that this was done to make the
system more acceptable to the professionals who would be
purchasing it. Research into online administration of school-readiness
tests also yielded promising results (Csapó, Molnár & Nagy, 2014).
Developments such as these indicate that computer-based
assessments are likely to be used in more contexts and on target
populations that may previously have been considered unsuitable for
such test administration.

14.4.6 Non-commercial resources become available


for test developers and users
Assessments that take the form of interactive computer games are not
only becoming more commonplace, the production of such games,
which used to be a highly specialised and costly task, is now
becoming easier and potentially less expensive, requiring very little
programming skill if the correct tools are used. Access to tools to
develop such games with cinematic realism and a high degree of
interactivity is becoming inexpensive or even free (Unity Technologies,
2017; Epic Games Inc., 2017). If this trend continues, it could become
possible for psychology professionals who are prepared to develop
the necessary competence to assume a more direct involvement in
the development of games and simulation for assessment purposes
(Park, Schwartz, Eichstaedt, Kern, Kosinski, Stillwell, Ungar &
Seligman, 2015).
Open-source item banks and tests have become available as a
resource for test developers and users. These are sets of items that
are placed on the Internet and that may be used freely by test
developers and researchers without the need to pay royalties. An
example is the International Personality Item Pool focusing on factor-
based personality models and containing around 3 200 items with
predetermined metric qualities for the scales, based on international
research (Goldberg & Johnson, 2017). For clinicians, there are also
royalty-free tools available. The NIH toolbox from the United States
National Institutes of Health can assess cognitive, emotional, sensory,
and motor functions using tablet devices (National Institutes of Health,
2017). Promis (the Patient-Reported Outcomes Measurement
Information System) is a set of royalty-free measures that measure
physical, mental, and social health in adults and children. These are
available as computer-adaptive tests (National Institutes of Health,
2017).
Research-based information about the ability, skill or knowledge,
and personal preference requirements of over 10 000 work roles is
also available in the public domain, alongside the questionnaires used
to analyse the roles, and the tests and questionnaires to measure
these requirements (U.S. Department of Labor, 2017).
Internet users can now also create their own Web sites very easily
without needing to know how to programme a computer. Sometimes
this can be done at no cost. A Web site can also be hosted very
inexpensively, or for free, if the developer allows the site to carry
advertising. There are also many platforms that allow users to create
and disseminate survey questionnaires inexpensively or for free. It is
also possible to integrate Internet-based training and testing using
open-source platforms. The Moodle system, for instance, supports
item banks and contains a ‘quiz’ component, which has the capacity to
administer items in a variety of structured formats, as well as open-
ended questions (Moodle HQ, 2017).
Thus, while Internet-delivered assessment is today largely a
commercial enterprise, there is a wealth of open-source resources
available for test developers and assessment professionals. Using this
technology doesn’t need to remain a prohibitively expensive
undertaking. These public-domain resources are potentially
enormously valuable, but they would need to be formally researched
in terms of their suitability for use in the multicultural South African
context.

14.4.7 Artificial intelligence


While computers may not be able to actually reason like humans yet,
they are capable of processing enormous quantities of data very fast,
they can learn, they can solve certain types of problems, and they can
behave in ways that are difficult to distinguish from human behaviour.
One of the earliest artificial intelligence programmes, called Eliza, was
actually designed to simulate a psychotherapist (Colby, Watt & Gilbert,
1966). Artificial-intelligence entities are already being employed to
conduct job interviews online and to provide support to users of
computer systems. Computers can be trained to analyse essays from
job applicants in a similar way to which a human rater would do so,
producing scores that are reliable, valid, and do not disadvantage
minority groups (Campion, Campion, Campion & Reider, 2016).
Usually, these are conducted via text messaging, but the quality of
voice recognition and voice synthesis is already so good that we may
sometimes be interacting with a computer online without knowing it.
Psychology practitioners are already relying heavily on computers to
generate narrative reports on test data. Most reporting engines are
coded in such a way that they rely on predetermined cut-offs and
relationships between what is measured and what the tests are
intended to predict. However, artificial neural networks (an artificial-
intelligence technology) are being utilised to determine what these
relationships should be (Cui & Guo, 2016). For this technology to
mature properly, cognitive models must be developed, high-quality
items need to be created, and psychometric techniques need to be
developed to interpret performance data properly. While these
technologies are not mature yet, they are continually developing and it
is not inconceivable that artificial intelligence may encroach
progressively on the professional role of the psychology professional
in the not-too-distant future.
Artificial intelligence is already being employed in game-based
assessments. Because interactive games offer test-takers such a wide
variety of possible behaviours and decisions, artificial intelligence
offers the possibility of what’s known as stealth assessment – the
deriving of conclusions about a person’s behaviour without them being
aware of it (Snow, Likens & McNamara, 2016). Social media is also a
rich source of behavioural evidence. When participating on online
Web sites, people are not really aware of how much information is
being collected surreptitiously about their behaviours. Personality can
be assessed reliably and validly from the language people use on
social media (Park, et al., 2015). This type of assessment is not
currently commercially available to professionals and employers, but it
is likely to become available in the future. Practitioners who follow this
approach will need to exercise extreme care regarding informed
consent and the privacy of the individuals being assessed.

14.4.8 Summary
The field of computerised assessment is developing at an ever-
increasing pace. Whereas early computerised assessments offered
improved standardisation compared to individually administered tests
(some even claimed to guarantee total standardisation), the diversity
of devices that are now used, the use of item-banked and adaptive
tests, and the increasing prevalence of unsupervised testing in
uncontrolled settings, mean that test-takers are not all experiencing
the assessment process in the same way.
Computerised assessments are appearing in new forms, and are
also targeted at populations that were not formerly included, such as
children and people with mental and emotional problems. The
emergence of artificial intelligence, simulations and gamified
assessments, and sophisticated computer-generated reports further
complicate the responsibilities and decisions of the assessment
practitioner. Because of the huge scale of many assessment projects,
many people are affected, which means that the assessment
practitioner carries a very heavy responsibility.

❱❱ ETHICAL DILEMMA CASE STUDY 14.1

Sindi Gumede is a psychometrist working for a cellular communications


company. The company runs a large and busy call centre. Some of the call-
centre agents deal with customers who have technical problems. Others deal with
the opening and activation of new accounts. A third group of call-centre agents
deal with debtors, following up on clients who have missed payments and whose
services might be terminated. A fourth group call existing clients to try to sell them
additional services and upgrades. The technical-support consultants work shifts,
and the shifts are rotated so that they all sometimes have to work the night shift.
Sindi is called into a management meeting and told that the company has
decided to implement a new system for the selection of call-centre agents. The
system is a battery of tests and exercises and can be completed on a computer,
tablet or cellphone. The first part is a simulation exercise. The respondent listens to
various conversations between a client and a call-centre agent, and then rates the
call-centre agent’s response on a five-point scale from ‘poor’ to ‘good’, and this is
scored for correctness. Other parts of the battery include a non-verbal reasoning
test, a personality test, a data-capturing exercise that evaluates the accuracy and
speed with which the respondent types in information, and an error-checking test
where the respondent has to compare one set of information with another. Sindi has
never seen this battery before, and is told that the battery is new in the country, and
will give her employer a competitive edge because they will be the first to use it. It
will be rolled out within the next two months for all call-centre
selections in the company.

Questions
1. What are Sindi’s ethical and professional responsibilities
regarding the implementation of this system?
2. What can Sindi do to ensure that the system is used legally and
professionally in the company?
3. Specifically consider the fairness implications of introducing
such a system in South Africa.

14.5Ethical and legal use of computer-


based and Internet-delivered testing
In South Africa
In South Africa, psychological testing is specifically regulated by the
Health Professions Act (No. 56 of 1974) and its regulations, which
also include the code of conduct for psychology professionals.
Occupational assessment is further regulated by the Employment
Equity Act (No. 55 of 1998). The Protection of Personal Information
Act (No. 4 of 2013) and the Promotion of Access to Information Act
(No. 2 of 2000) are also relevant to assessment in general. In addition
to legislation, which must be obeyed, there are also guidelines and
policies, some from the Health Professions Council of South Africa
(HPCSA), and others from international bodies such as the
International Test Commission (ITC). Policies and guidelines are
helpful in guiding ethical decision-making and making us aware of our
responsibilities. However, where policies and guidelines are
inconsistent with legislation, the laws and regulations must be obeyed.
The ITC published guidelines on computerised testing in 2005
(International Test Commission, 2005). The Professional Board for
Psychology adapted these guidelines for South Africa and published
them as a Board policy. The modified policy ruled out unsupervised
testing and the testing of respondents younger than 18 years without
the consent of parents. This was met with legal resistance from test
publishers and the modified policy was withdrawn. The Health
Professions Council has since issued guidelines regarding tele-
medicine for practitioners (Health Professions Council of South Africa,
2014). This indicates that the Council has become open to delivering
healthcare services remotely. Tele-medicine is the practice of
medicine using electronic communications, information technology or
other electronic means between a healthcare practitioner in one
location and a healthcare user in another location.
Although the ITC guidelines on computerised and Internet-
delivered testing are now becoming outdated, given the rapid
technological and psychometric developments in the field, they are still
valuable for South African practitioners since they describe the
responsibilities of test developers, test publishers, test users and test-
takers regarding computerised psychological assessments, and
highlight pitfalls where such assessments can be unfair or give
inaccurate results. They can be considered as the closest to ‘best
practice’ guidelines available to South African practitioners. South
African practitioners are advised to be alert for updates to this
document, because the guidelines are being being revised (see
http://www.intestcom.org).
The ITC guidelines are a comprehensive document that assigns
specific responsibilities to test developers, test publishers, and test
users. Test users are advised to study it in more detail than we can
represent here, where we are only highlighting some of the
responsibilities of test users. Significantly, test users are expected to
have a certain level of technical understanding in terms of software
and hardware, to ensure that the equipment being used is compatible
with the test requirements, and to be able to deal with technical
problems when they occur during the testing process. From these
guidelines it is clear that it is not considered best practice for test
users to administer tests using systems on which they have not been
trained and on which they have not had sufficient experience. Test
users must ensure that respondents know how to interact with a
testing system and must not simply assume that people have the
necessary computer literacy. Test users are also required to critically
assess whether other modes of test administration would not be more
appropriate than computer-based testing for a given purpose and the
needs of a particular respondent. Test users are directed to be
discerning about the research evidence for the tests they use; they
must consider the psychometric properties and whether they are
documented, as well as the appropriateness of the measures used in
terms of the knowledge, skill, and ability requirements, as well as their
cultural appropriateness. Test users must also ensure that they
understand how the computer-generated reports work and what
limitations they may have. Test users must take care in using the
appropriate report formats for the respondent and the purpose of the
assessment. It is advised that computer-generated reports should be
edited where appropriate, to include background information and
assessment information from other sources. With regard to the level of
supervision over the testing process, the test users are directed to
consider the level of supervision for which the test was designed, and
also to ensure that the identities of test users are authenticated and
that cheating is controlled. Test users must also ensure that
respondents are not overexposed to the test items (this can be difficult
to do when job-seekers repeatedly do the same tests when applying
for jobs at different employers). Test users also have the responsibility
to verify that data are transmitted and stored securely, and regularly
backed up. The ITC guidelines further specify that test users must
abide by legal, professional, and ethical mandates in the country
where they are working (International Test Commission, 2005).
The ITC also has a guide for test-takers who are being assessed
using technology-based assessments (International Test Commission,
2010). South African test users would do well to make this document
available to the test-takers they assess. The guide explains what the
test-taker can expect during the testing session, and also what would
be expected from them. Significantly, this guide expects test-takers to
ensure that their equipment meets the technical specifications, when
testing is to be conducted remotely, and they are responsible for
ensuring that they know how to operate the equipment. The test-taker
must make sure that they understand the test instructions, and also
undertake to answer the test honestly – not obtaining help from
anyone or making copies of the test materials that appear on the
screen. South African assessment practitioners need to make sure
that the people they assess using computerised assessment
technology, especially via unsupervised testing, are at a sufficient
level of technological sophistication to assume the responsibilities
expected of them.

❱❱ CRITICAL THINKING CHALLENGE 14.1

How does an assessment practitioner determine if a test-taker is


sufficiently ’technologically sophisticated’ or ‘digitally literate’ to take
a computer-based test? [Hint: look into the issue of ‘test-wiseness’,
especially in Section 17.2.3.5 in Chapter 17.]

The Employment Equity Act requires that tests used in industry should
be reliable, valid, not biased against a particular group or individual,
and that they must be able to be used fairly (Department of Labour,
2014). The requirement that such tests should also be certified was
rescinded (ATP v. the President of SA, the Minister of Labour and the
HPCSA, 2017). This means that if assessment practitioners should
decide to use a new or specialised computer-based test that has not
been classified by the Psychometrics Committee, they need to verify
that the psychometric properties of the test comply with the
Employment Equity Act. The code of conduct for the profession
requires the use of classified tests.
In terms of the Scope of Practice for the Profession of Psychology
(Department of Health, 2008), testing and assessment of a long list of
psychological constructs are reserved for the profession of
psychology, and it is a criminal offence for unregistered people to
perform this activity. It is also no longer allowed for psychology
professionals to delegate psychological acts to unregistered people.
The question of whether the administration of a structured test, or, by
extension, a computerised test, constitutes a psychological act is still
not entirely clear. In his 2010 judgment, Judge Bertelsmann refrained
from declaring that it is not illegal for an unregistered person to
administer psychological tests. However, at the time there was no
gazetted list of classified tests. The judge expressed the opinion that
the mere administration of tests did not constitute the use of just tests
– only when tests are interpreted for the purpose of assessing
psychological constructs does it become a psychological act − but
nevertheless he did not issue the declaratory order to confirm this
(Bertelsmann, 2010). At the time of writing this chapter, new
regulations regarding the use and control of psychological tests are in
development, and if these are passed the situation may change.
Psychometrists have a vested interest in protecting the administration
of tests as something reserved for registered professionals.
With regards to computer-based assessment, practitioners should
ask themselves whether administration of the tests, or the testing
project, requires professional skill and judgment. If it is totally
straightforward, it may be acceptable to delegate this task. But if there
are complex considerations that affect the fairness of the assessment
process, it should not be delegated. Even if a routine action is
delegated, professional responsibility can never be delegated to an
unregistered person. In this regard it should be noted that internet-
delivered assessment projects may be becoming more complex where
assessment is done in different geographical locations, on people with
different degrees of technological familiarity, and on different devices.
The code of conduct for the profession of psychology (Department
of Health, 2006) specifies that assessment must always take place
within a defined professional relationship. Informed consent must
always be obtained when assessment is done using automated or
Internet-based testing, even if it is a routine assessment such as a job
application where consent could be implied. Informed consent also
involves the clarification of why the assessment is being done, the
possible consequences and risks of doing the assessment, and the
consequences of refusing to do it or withdrawing from the process
once it has begun. The practitioner is also obliged to discuss the
limitations of the data obtained from such assessments with clients
(respondents themselves and also possible third parties such as
employers). Practitioners should take personal responsibility for the
reports and feedback done on such assessments. This may mean that
computer-generated reports need to be further edited to add
background information, or to triangulate information to ensure that
important decisions are not based on single assessments.
Establishing the professional relationship when Internet-based
testing is used could be a difficult process, especially when one
practitioner is responsible for the assessment of a large number of
people in different geographical locations. It may be possible to
establish the relationship via a telephone call or an electronic text
conversation such as Skype, or via an online meeting platform where
the assessment professional can address a group of people at a time.
Even with the use of such technologies, this can be a challenge if
there are hundreds of respondents and only one practitioner. There is
at present no easy solution, and this aspect is often overlooked with
assessment systems that, with one click, send out hundreds of emails
containing links to tests to test-takers who the assessment practitioner
has never even met.
All ethical considerations regarding psychological services apply to
computer-based assessment. The practitioner should treat clients as
individuals with dignity and personal worth. All clients should be
treated equally, even if some find the completion of the tests a
frustrating technical challenge.
Although computer-based assessments represent a significant
labour-saving technology, it is difficult to do them properly and
professionally. Assessment practitioners should guard against
becoming complicit in a dehumanising and excessively automated
assessment process where the rights of the test-takers may be
compromised. This may require them taking a stand in their places of
employment, and towards the suppliers of assessment technologies,
against the de-personalisation and de-professionalisation of
psychological assessments.
14.6 Conclusion
In this chapter a brief overview of the rapidly developing field of
computerised assessment was provided. The history of computer-
based assessment in South Africa was presented. It was noted that
computerised assessments are appearing in new forms (e.g.
computerised simulation assessments), a variety of devices are being
used, unsupervised testing in uncontrolled settings is commonplace,
and computer-generated reports have become more sophisticated.
These further add to the responsibilities of the assessment practitioner
and the decisions required. The chapter ended with a discussion
about the ethical and legal use of computer-based and Internet-
delivered testing in South Africa.

❱❱ CRITICAL THINKING CHALLENGE 14.2

Consider the following statement from the Ethical Rules of Conduct (Department of
Health, 2006, paragraph 44(7)):
‘When any electronic, Internet or other indirect means of
assessment is used, the psychologist concerned shall declare this
and appropriately limit the nature and extent of his or her findings’.
What is your reaction to this statement? In what way does computer-
based or Internet-delivered assessment limit the nature of the assessment
findings, and is this any different to the limitations experienced when
performing traditional paper-based or performance-based testing?

This was the last stop in the Types of Measure Zone. Here are a few
Web sites where you can obtain additional information on, and
evaluations/reviews of, assessment measures: https://buros.org
(Buros Center for Testing); http://ericae.net (ERIC Clearinghouse on
Assessment and Evaluation); and http://cresst.org (CRESST, the
National Center for Research on Evaluation, Standards, and Student
Testing).
As you look back over this zone, there are a number of critical
concepts that you should have grasped in Chapters 10 to 14. By
answering the following questions you will be able to assess whether
you have mastered the critical concepts in this zone.

CHECK YOUR PROGRESS 14.1


14.1 What is intelligence?
14.2 How can cognitive functioning be assessed? Provide some
examples of both individual and group measures of general
cognitive functioning and specific cognitive functions.
14.3 How can affect, adjustment, and coping be assessed? Give
some examples of measures.
14.4 What is personality and how can its various dimensions be assessed?
Give some examples of different types of personality measures.
14.5 What is career assessment? Give some examples of the
measures used in career assessment.
14.6 Explain what is meant by the terms ‘Internet-delivered testing’,
‘computerised adaptive testing’, and ‘computerised simulation
assessments’, and highlight some of the advantages and challenges
of such testing.
14.7 Discuss some of the issues that need to be considered to
ensure that computer-based and Internet-delivered testing is
conducted in ethical and legal ways.

The final stretch of the journey lies ahead. You will be travelling in the
Assessment Practice Zone again, where you will learn to apply what
you have learned to the everyday practice of assessment.
Assessment Practice Zone (2)
We are now in Part 2 of the Assessment Practice Zone and you are
entering the final section of your journey! There is still quite a lot to
be seen and digested at the remaining stops.

15 You will explore how the different types of measures to which you
Part have been introduced are applied in a variety of contexts.
16 You will explore issues related to interpreting measures and
4 reporting on assessment results.
17 You will investigate the factors that influence the outcome of an assessment.
18 And finally, you will be able to take stock of progress being made regarding
making assessment more meaningful and African-centred, and take
a peek at some of the next steps. Enjoy the rest of the journey!
The use of assessment measures
in various applied contexts
CHERYL FOXCROFT, LOUIS STROUD, FRANCOIS DE KOCK, AND GERT ROODT
Chapter CHAPTER OUTCOMES
15 In this chapter, you will be required to draw on your knowledge of fair and
ethical assessment practices (covered in Chapters 1, 7, 8, 9, and 14) and the
cognitive, affective, personality, interest, and career measures that were cove red
in Chapters 10 to 13 of the Types of Measure Zone. In this chapter, we will discuss
how these and other measures are applied in certain contexts.
At the end of this chapter you will be able to:
› explain how psychological assessment is applied in industry
› explain what developmental assessment is, what functions are assessed, and
the different types of measures used
› describe some potential assessment innovations for children emerging from
therapeutic contexts
› explain how assessment measures are used in educational contexts
› understand how psychological assessment is used for psychodiagnostic
purposes in a variety of contexts, including psycholegal areas.

15.1 Introduction
In this chapter you will be introduced to how psychological
assessment is applied in industry, for the purposes of developmental
assessment, in educational contexts, and for psychodiagnostic
purposes. In the process some of the measures discussed in previous
chapters will be referred to and you will also be introduced to other
assessment measures.

15.2 Assessment in industry


Assessment in industry can be divided into three broad areas. The
first area is concerned with the psychological measurement of
attributes of individuals in the workplace. Most assessment measures
used for this purpose can be classified as psychological measures or
techniques. The second and the third areas are respectively
concerned with the assessment of groups and organisations. In the
latter two cases, the measures used are not necessarily classified as
psychological measures, although the principles of test construction
(classical test theory or psychometric theory) are used in their
development. For an interesting overview of issues related to
assessment in organisations, refer to Bartram (2004).

15.2.1 Assessment of individuals


Individuals in the workplace are assessed for various reasons. The
goals of individual assessment are to assess:
individual differences for selection and employment purposes
• inter- and intra-individual differences for placement, training,
promotion and development, as well as for compensation and
reward purposes.
Psychological measures or other assessment instruments that comply
with the reliability and validity requirements are used in personnel
selection, performance appraisal, situational tests, and assessment
centres. We will discuss each of these contexts briefly.
15.2.1.1 Personnel selection
Perhaps the most widely used function of psychological measures in
industry is to assist in screening, selection, and employment decisions
of job applicants. Personnel selection is a process of decision-making
about the suitability of applicants for a particular position (Salgado,
2017). Psychological assessment measures generate useful data that
supplement other sources of personal information about a candidate,
such as biographic information, work history, personal references, and
interview information. Other initial screening devices listed by Cascio
and Aguinis (2005) are honesty (integrity) tests (see Section 17.3.4.3
in Chapter 17); Weighted Application Blanks or Biographical
Information Blanks; drug tests; and even polygraph tests.
Basically, two broad approaches are used in the application of
psychological measures for selection purposes. In the first instance,
individuals are compared with the job specifications in terms of their
personal characteristics or personality traits. This approach is called
an input-based approach because individuals’ characteristics are
matched with what is required from a job. This approach is also
referred to as the psychometric evaluation or testing approach.
For instance, a fisherman must have tenacity, patience, and the ability
to resist boredom. What do you think the requirements are for a driver
of a passenger bus? Several psychological assessment measures
can be used to assess these characteristics or traits.
The second approach is an output-based approach where
individuals are compared in relation to the required output standards
of a job. In this instance, the aim is to determine whether the individual
has the necessary competencies to perform a particular task or job.
This approach is also referred to as the competency assessment
approach (see Sections 1.1 and 2.2.5 in Chapters 1 and 2
respectively). The ability to read fluently, to write, to operate specific
machinery, or to drive a vehicle skilfully are all examples of
competencies that might be required to perform a particular job. What
competencies do you think are required to lead a specialist police
reaction team that deals with abductions and hostages? Measures
that are used to assess a person’s competencies reliably and validly
are not normally classified as psychological measures, although they
should nonetheless be reliable and valid competency assessments.
See Nicholls, Viviers, and Visser (2009) for an example of the
validation of a test battery for call-centre operators.
The basic assumptions underlying these two assessment
approaches are fundamentally different. In the first approach, a
predictive approach is followed where personal characteristics are
matched with job requirements or specifications. In the latter
approach, a person’s competencies are assessed in order to
determine whether they meet minimum performance criteria or
standards. The challenge in both these approaches, however, remains
to address the potential issues around bias and adverse impact. All
assessment measures have to demonstrate that they are neither
biased nor display any adverse impact when applied in a multicultural
or diverse context. What is adverse impact? ‘Adverse impact in
personnel selection refers to the situation where a specific selection
strategy affords members of a specific group a lower likelihood of
selection than is afforded members of another group’ (Theron, 2009,
p. 183).
Psychological tests can also be classified as tests of maximal and
typical performance (Huysamen, 1978). Tests that are categorised
under maximal performance tests are aptitude tests (tests focused on
assessing a person’s performance potential), academic performance
or ability tests (tests that assess a person’s ability or what he or she
has learned up to his or her current stage of development).
Personality and interest tests (categorised under typical performance
tests) come to mind; these can assess work-related characteristics or
traits and job requirements in a reliable and valid manner. Other
measures listed by Cascio and Aguinis (2005) in this regard are
projective techniques, personal history data, and peer assessments.
Some of these measures of maximal and typical performance will
be presented under the categories in the next sections. (Owing to a
lack of space, we only list a small selection of some of the measures
here under these categories, and this does not nearly reflect a
complete list of available measures):
The most popular instruments in South Africa for assessing a
person’s different work-related aptitudes are:
Technical Test Battery (Mechanical and Spatial Reasoning)
(distributed by Psytech SA)
Clerical Test Battery (distributed by Psytech SA)
Quanti-Skill Accounting Assessment (QS) (distributed by JvR
Psychometrics)
• Customer Service Aptitude Profile (Customer Service AP)
(distributed by JvR Psychometrics).
The most popular instruments in South Africa for assessing a person’s
work-related abilities are:
TRAM-1, TRAM-2, APIL and the APIL SV, which are instruments
developed for assessing a person’s capacity to learn linked to different
education levels, ranging from illiterate to post-matric qualifications.
(These instruments are developed by research psychologist Dr Terry
Taylor and distributed by Aprolab.)
General Reasoning Test Battery (distributed by Psytech SA)
Critical Reasoning Test Battery (distributed by Psytech SA)
General Ability Measure for Adults (GAMA®) (distributed by JvR
Psychometrics)
Learning Potential Computerised Adaptive Test (LPCAT)
(developed by Professor. Marié de Beer and distributed by JvR
Psychometrics)
Wechsler Adult Intelligence Scale (WAIS-IV) (distributed by JvR
Psychometrics)
Cognitive Process Profile (CPP), a computerised simulation
exercise that measures the cognitive capabilities, preferences, and
potential of adults in the work environment. Constructs measured are
cognitive styles, learning potential, a suitable work environment, and a
person’s cognitive strengths and development areas (developed by Dr
Maretha Prinsloo and distributed by Cognadev)
Learning Orientation Index (LOI), a computerised simulation
exercise that measures the cognitive capabilities, preferences, and
potential of school leavers and college and university students for
purposes of career guidance and cognitive development (developed
by Dr Maretha Prinsloo and distributed by Cognadev)
• The Verify Range of Ability Tests, a battery of seven tests
measuring differential abilities, namely Verbal reasoning, Numerical
reasoning, Inductive reasoning, Checking, Calculation, Mechanical
comprehension, and Reading comprehension (developed and
distributed by CEB, formerly SHL).
The most popular instruments in South Africa for assessing a person’s
personality are:
Occupational Personality Profile (OPP) (distributed by Psytech SA)
15 Factor Questionnaire Plus (15FQ+) (distributed by Psytech SA)
Occupational Personality Questionnaire® (OPQ32®), an
occupational model of personality, which describes 32 dimensions or
scales of people’s preferred or typical style of behaviour at work
(distributed by CEB)
16PF® Personality Questionnaire (16PFQ®) (initially developed by
Cattell, Saunders & Stice (1950) and distributed by JvR
Psychometrics)
Hogan assessment series (the Hogan Personality Inventory – HPI
and Hogan development Survey – HDS) (distributed by JvR
Psychometrics)
Myers-Briggs Type Indicator® (MBTI®) (distributed by JvR
Psychometrics)
Hogan Motives, Values, and Preferences Inventory (MVPI)
(distributed by JvR Psychometrics)
The Motivational Profile (MP), a Web-delivered computerised ‘game’
that measures various aspects of motivational behaviour including a
person’s life script, self-insight, energy levels, energy themes,
operational effectiveness, defence mechanisms, and personality
(developed by Dr Maretha Prinsloo and distributed by Cognadev)
• The Value Orientations (VO), a Web-delivered computerised
questionnaire that assesses a person’s worldview, valuing
systems, as well as perceptual and organising frameworks
(developed by Dr Maretha Prinsloo and distributed by Cognadev).
The most popular instruments in South Africa for assessing a person’s
work-related interests are:
Occupational Interest Profile (distributed by Psytech SA)
CampbellTM Interest and Skills Survey (CISS®) (distributed by JvR
Psychometrics)
Strong Interest Inventory (distributed by JvR Psychometrics) Career
Interest Profile Version 3 (CIP v.3) (distributed by JvR
Psychometrics)
• The Career Path Finder – Indepth (CPI) has been developed to
assess an individual’s career interest and preferences (developed
and distributed by CEB).
The most popular instruments in South Africa for assessing a person’s
work-related competencies are:
Broad Band Competency Assessment Battery (BB-CAB) for
assessing general competencies for people on NQF levels 4 and 5
(developed by Dr Terry Taylor and distributed by Aprolab)
Competency Assessment Series (CAS) (distributed by JvR
Psychometrics)
Job Specific Solutions are job-based test batteries that contain
various test components that are highly job-relevant and predictive of
success in the target job. A Job Specific Solution typically measures
three to 10 competencies identified to be relevant to the target job
(developed and distributed by CEB)
• Graduate Talent Simulations combines job-realistic, compelling,
and highly engaging content into predictive assessments. These
simulations measure five of the most important and predictive
competencies for success in a team-based work environment,
through state-of-the-art 3D-animation technology (developed and
distributed by CEB).
In South Africa, test use in industry is restricted to a list of classified
psychological tests (see Board Notice 155 of 2017; RSA Government,
2017). So it is very important that test users (in other words, those
appropriately registered in the profession of psychology) ensure that
the measures they wish to use have been appropriately classified by
the Psychometrics Committee of the Professional Board of
Psychology. Classification ensures that psychological tests are used
only by appropriately registered psychology professionals. The review
of the tests performed during the classification process provides
valuable information for psychology professionals to determine
whether the tests they use adhere to, amongst others, the
requirements of the Employment Equity Act (No. 55 of 1998) in that
tests should be valid, reliable, fair, and unbiased. Other important
criteria for tests that are relevant in industry include positive applicant
reactions (e.g. test-takers should be able to see the ‘line of sight’
between the test content and the requirements of the target role) and
test utility (e.g. the return-on-investment for a particular measure
should make financial and practical sense). Finally, it is critical that
your choice of tests can be appropriately justified in terms of the job
relevance of these measures. One of the best ways to accomplish this
is to ensure that a thorough job analysis is conducted to identify not
only the critical performance areas of the position, but also to highlight
the key person characteristics (e.g. knowledge, skills, abilities, and
other characteristics) required for effective performance in these
areas.

❱❱ CRITICAL THINKING CHALLENGE 15.1

You are the in-house psychologist for Company A. You are requested to
assemble a test battery for selecting candidates from diverse groups for
a management-development course. You have to make use of aptitude,
ability, interests, and personality measures. Based on what you have
learned so far, which criteria will you apply for selecting measures from
the different lists above? Motivate your choice of criteria.

15.2.1.2 Performance ratings or assessment


Principles of psychometric theory (or Classical Test Theory) should
also be applied in the assessment or rating of a person’s job
performance (for interest’s sake, go to any South African work
organisation and see if this is indeed the case). In assessing job
performance, we have also input- and output-based approaches. The
input-based approach refers to the evaluation of a person’s input,
such as personality traits, personal attributes, or characteristics that
are important for achieving high performance standards in a job. In the
output-based approach a person’s achieved outputs are rated
against required output standards (specified job competencies) as
specified by the job requirements.
When performance appraisals are conducted, they should also be
reliable and valid (see Aguinis, 2013, for an in-depth discussion of
performance management). For this reason, the performance-
assessment measures should ideally be based on psychometric
principles, and be evaluated for reliability and validity aspects. It has
been found that the use of multiple assessors in performance reviews
normally results in more reliable and valid assessments. Lately, the
use of the so-called 360-degree competency (multiple-source)
assessments is gaining ground based on sound validation research
(also see Hoffman, Gorman, Blair, Meriac, Overstreet & Atchley 2012;
and Theron & Roodt, 1999, 2000), as opposed to the traditional,
single-source performance appraisal where a ratee’s performance is
only rated by a single rater. (Do you know any organisations where
this practice is still in use?)
15.2.1.3 Interviews
Interviews can either be classified as structured or unstructured. In the
structured interview, there is a clearly defined interview schedule
that specifies the content and the order of questions during the
interview. The responses to specific questions may also be closed or
fixed, meaning that the respondent has to choose between a limited
number of options provided. Questions are normally structured in such
a way that they elicit specific reactions or responses from the
candidate, which are then evaluated for appropriateness. Such
structured interviews may still yield highly reliable results, as was
found with the use of structured, competency-rating interviews
(Cronbach Alpha = .94) (Roodt, 2007).
Unstructured interviews do not have any particular schedule,
and lack the structure and order of the structured interview.
Responses to questions may in this instance be totally open. Trained
interviewers (such as experienced psychologists) can still elicit
valuable information from such unstructured interviews.
A word of caution should be raised against the use of unstructured
interviews (conducted by unqualified and/or inexperienced persons)
for screening, selection, and employment decision purposes. In many
organisations the dominant (or even unlawful) practice of conducting
employment interviews and solely basing employment decisions on
the outcomes of such interviews still remains. Can the people who are
conducting such interviews provide you with evidence on such
interviews’ reliability (consistency of application) and validity
(relevance in terms of job content)? The function of employment
interviews therefore is to supplement other assessment data and to
verify outstanding or contradicting information.
15.2.1.4 Situational tests
There is a wide range of different situational techniques and measures
that can be used for making employment decisions for appointments
in supervisory and managerial positions. Most of these situational
tests are used in the Assessment centre or Situational Judgement
Test context.
Assessment centres (ACs) are best described as a combination
of the above-mentioned exercises (role plays, simulations, in-basket
tests, and leaderless group exercises) where candidates are observed
by a number of trained observers for specified behavioural dimensions
applicable in a specific job (e.g. that of a manager or a supervisor).
The International Guidelines (2015, p. 1248) defines an AC as:

An assessment center consists of a standardized evaluation of behavior


based on multiple inputs. Any single assessment center consists of multiple
components, which include behavioral simulation exercises, within which
multiple trained assessors observe and record behaviors, classify them
according to the behavioral constructs of interest, and (either individually or
collectively) rate (either individual or pooled) behaviors. Using either a
consensus meeting among assessors or statistical aggregation, assessment
scores are derived that represent an assessee’s standing on the behavioral
constructs and/or an aggregated overall assessment rating (OAR).

Schlebusch and Roodt (2008) again define ACs as a simulation-based


process employing multiple assessment techniques and multiple
assessors to produce judgements regarding the extent to which a
participant displays selected competencies required to perform a job
effectively. ACs are usually employed for development or selection
purposes. Several variations exist within these two broad categories.
Schlebusch and Roodt argue that a more structured and systematic
approach towards AC analysis, design, implementation, and
evaluation could enhance their reliability and validity.
The basic premise of ACs is that different simulations (e.g. in-
basket; counselling discussion; group meeting) are presented in which
participants’ different sets of competencies relevant to a position (e.g.
initiative; information gathering; judgment; control) are observed and
assessed (rated) according to predetermined rules. The participants’
behaviours in simulations are observed and rated by different trained
observers (raters). Rating outcomes are scored on a matrix based on
the mix of different exercises and competencies. The matrix is drawn
by combining different exercises (in columns) and the relevant
competencies (in rows). Scores are allocated for the different
competencies in each simulation that result in some cells being left
blank where those competencies are not measured (observable) in an
exercise. Consequently, competency ratings can be totalled across
exercises (columns) and exercise scores totalled across
competencies (rows). An Overall Assessment Rating (OAR) can be
calculated by adding all exercise scores or by adding all competency
scores.
ACs have gained in popularity in recent times because of their high
face validity. Candidates can relate to the exercises and simulations
because they perceive them as relevant and appropriate in the work
context. A word of caution, however, needs to be expressed. ACs
must be evaluated in terms of their content validity, the training and
experience of observers, the selection of exercises, and the predictive
validity of these centres. Roodt (2008) provided a number of practical
suggestions to improve the reliability and validity of ACs. These
suggestions relate to improving the consistency of:
the context in which ACs are conducted
the presentation of the simulations
the administration and observation processes
the role plays and performance of the role players.
The more consistent these aspects are kept when ACs are conducted,
the less the error variance that will occur across different applications.
Some years ago, Pietersen (1992), cautioned users and
supporters of ACs and called for a moratorium on the use of ACs. His
plea was based on the relatively poor predictive validity of ACs at that
point in time. Recent developments in more sophisticated research
and statistical methods (such as structural equation modelling and
meta-analyses) have led to more conclusive evidence that AC scores
validly predict (compare Arthur, Day, McNelly & Edens, 2003;
Thornton & Rupp, 2006; Thornton & Gibbons, 2009) and explain a
substantial amount of variance in job performance (Ones, 2008).
Situational Judgment Tests (SJTs) are measurement methods
that present applicants with job-related situations and possible
responses to these situations (Lievens, 2006; Lievens, Peeters &
Schollaert, 2008; Whetzel & McDaniel, 2009). Different variations of
ACs and SJTs have gained popularity in the recent past for making
selection and employment decisions. The face validity of these
assessment techniques plays a major role in this state of affairs. We
will discuss some of these techniques briefly below:
Simulations
Simulations (role plays) attempt to recreate an everyday work
situation. Participants are requested to play a particular role and to
deal with a specific problem. Other role players are instructed to play
certain fixed roles in order to create consistency in different
simulations. Trained observers observe the candidate to assess
specified behavioural dimensions. Reliability and validity of
simulations can vary according to the training and experience of
observers, knowledge of the behaviours that need to be observed,
and the consistency of the other role players.
Vignettes
Vignettes are similar to simulations but are based on a video or film
presentation in which the candidate is requested to play the role of a
particular person (such as the supervisor or manager) and to deal with
the problem displayed in the vignette. Vignettes are more consistent in
terms of the presentation of the scenario than simulations, but are
open to many possible solutions from which a candidate may choose.
Leaderless group exercises
In this instance, a group of candidates is requested to perform a
particular task or to deal with a specific problem while being observed.
Trained observers rate the leadership qualities and other behavioural
dimensions that the candidates display during their interactions.
In-basket tests
The in-basket test typically consists of a number of letters, memos,
and reports that the average manager or supervisor is typically
confronted with in his or her in-basket or in their e-mail inbox. The
candidate is requested to deal with the correspondence in an optimal
way. The responses of the candidate are then evaluated by a trained
observer.
Serious games
Serious games present test-takers with highly interactive stimuli that
may come in the form of video, audio, or other multimedia, in order to
measure their personality traits, interests, abilities, or other
characteristics. These games attempt to conceal the dimensions that
are being measured in order to avoid socially desirable responding or
‘faking’. Furthermore, they try to engage test-takers in a more fun,
challenging, and immersive experience, as compared to traditional
‘static’ assessment measures. As these tools are still in their infancy.
A serious potential issue, however, is the degree to which they show
construct validity, avoid contamination with other constructs, and
adhere to other desirable psychometric characteristics (see also
Section 14.4.5).
All the above-mentioned situational tests or other similar assessments
should still comply with the requirements of reliability and validity as
specified in the Employment Equity Act. This can be achieved by the
training of raters, the selection of content-valid exercises or
simulations, standardised assessment procedures, and standardised
stimuli (settings). Ultimately, assessment practitioners also have to
conduct reliability and validity analyses on their assessment data in
order to provide validation evidence.

15.2.2 Assessment of workgroups or work teams


As indicated earlier, the second broad category of assessment is
focused on the assessment of workgroups or work teams (or
subgroups such as dyads and triads) in terms of their unique
processes, characteristics, and structures (also refer to Bartram, 2004;
and Van Tonder & Roodt, 2008, in this regard). These types of
assessments are mainly used for diagnostic and development
purposes and are not classified as psychological measures, but would
most probably yield more valuable results in the hands of an
experienced industrial psychologist.
Let us first explain what is meant by the terms processes,
characteristics, and structures in a group context:
Group processes: Aspects such as leadership, conflict-handling
(Rahim, 1989), negotiation, communication, group dynamics, and
decision-making are some of the group processes that can be
assessed. Other relevant group behavioural processes that can be
assessed are groupthink or group shift, power in groups, and
conformity to group norms.
Group characteristics: Groups can be assessed and categorised in
terms of their own unique characteristics, such as their level of
cohesion, group development stages, dominant leadership style,
conflict handling and negotiation styles, level of trust or group
effectiveness criteria.
• Group structure: Groups can also be assessed in terms of their
structure, such as the status of members in terms of their primary
reference group (in-group or out-group), their respective team roles
in the group, their sociometry (social networks or interaction
patterns) (Hart & Nath, 1979), and their different status levels within
the group or within subgroups.
The results obtained from group assessments are used for inter-group
or even intra-group comparisons. Although groups are the targets for
assessment, the assessment data are still obtained at an individual
level. In other words, subjective individual assessments are collated to
describe group processes, characteristics or structures. Measures
used to assess group processes, characteristics, and structures have
to be based on sound psychometric principles in order to be reliable
and valid.

15.2.3 Assessment of organisations


As indicated before, the third broad category of assessment focuses
on the assessment of organisations (or organisational sub-units such
as sections or divisions) in terms of their unique processes,
characteristics, and structures (also refer to Bartram, 2004; and Van
Tonder and Roodt, 2008 in this respect). This group of assessments is
also largely used for diagnostic and development purposes and they
are also not classified as psychological measures. But these
measures would also yield more valuable results in the hands of an
experienced industrial psychologist.
We need to explain first what is meant by the terms organisational
processes, other behavioural processes, characteristics, and
structures in an organisational context:
Processes: Organisational communication, corporate climate, and
culture (e.g. Van der Post, De Coning & Smit, 1997; Dannhauser &
Roodt, 2001), organisational socialisation (e.g. Storm & Roodt, 2002),
and mentoring (e.g. Janse van Rensburg & Roodt, 2005), as well as
organisational change processes (e.g. Kinnear & Roodt, 1998;
Coetzee, Fourie & Roodt, 2002) are some of the organisational
processes that can be assessed.
Other behavioural processes: Work alienation, employee
engagement, job involvement, burnout, work stress, organisational
commitment as well as stereotyping and prejudice are relevant
behavioural processes in the organisational context. A host of local
and international measures exist (too many to list) that can assess
these processes.
Characteristics: Organisations can also be evaluated in terms of their
identity, strategic style, internal politics, organisational effectiveness
criteria, organisational policies, practices and procedures as well as
other similar characteristics.
• Structure: Organisations can be evaluated in terms of the different
design options and their different structural configurations, as well
as their operating models or value disciplines (Potgieter & Roodt,
2004).
The results obtained from organisational assessments are used for
inter-organisational comparisons or even intra-organisational
comparisons.
Although organisations are the targets for assessment in this
instance, the assessment data are still obtained at an individual level.
In other words, subjective individual assessments are collated to
describe an organisational process, characteristic or structure.
Measures used to assess organisational processes, characteristics,
and structures also have to be based on sound psychometric
principles in order to be reliable and valid.

15.2.4 Organisational research opportunities


Knowledge of psychometric theory is useful in industry as far as the
application and refinement of psychological assessment measures for
assessing individual (and intra-individual) differences are concerned.
Furthermore, knowledge related to psychological measurement can
also be applied to the assessment of group and organisational
characteristics, processes, and structures in a valid and reliable way.
In this regard, many research opportunities present themselves as
there is a lack of research into measures that tap group and
organisational processes, characteristics, and structures.

❱❱ CRITICAL THINKING CHALLENGE 15.2

You work in the human resources division of a large motor-manufacturing company.


You are asked to assess a team of engineers who design new cars to ascertain how
they handle conflict. Using your knowledge of different types of assessment
measures, as well as your knowledge about the basic principles of psychological
measurement, how would you set about this assessment? What measures and
techniques, both psychological and other, would you use, and what would you do to
ensure the validity and reliability of your assessment findings?

ADVANCED READING
Aguinis, H. (2013). Performance management. 3rd edition. Englewood Cliffs, NJ:
Prentice-Hall
Schlebusch, S. & Roodt, G. (2008). Assessment centres: Unlocking potential for
growth. Randburg: KnowRes Publishers.
Scott, J.C. & Reynolds, D.H. (2010). Handbook of workplace assessment. Wiley:
Hoboken.

Infant and preschool


15.3
developmental assessment
Imagine a one-week-old baby. What do you think such a small infant
can and cannot do? Perhaps, at this age, you would expect the baby
to be able to suck from a bottle, or the mother’s breast; to grasp your
finger; and to turn its head towards you when you touch its cheek. You
would certainly not expect a week-old infant to sit up and talk to you or
walk over to you and, when hungry, ask you for a sandwich. You
would, however, expect such behaviour of a 3-year-old child. These
observed changes in behaviour that take place over time are referred
to as developmental changes, and provide us with guidelines to judge
whether a child’s behaviour is age-appropriate or not.
Just as you have observed developmental changes in children,
many others have also done so and some of them have become well
known for their theories of human development. Undoubtedly, one of
the most prominent of these was Charles Darwin (1809–1882), who is
hailed by many as the scientific father of child development. Although
the long-term impact of Darwin’s contribution has been seriously
questioned by critics (e.g. Charlesworth, 1992), there is sufficient
evidence to support the fact that his work provided a catalyst for a
number of renowned developmental psychologists (e.g. Stanley Hall,
Arnold Gesell, Jean Piaget, and Sigmund Freud), whose theories form
the basis of measures designed to assess the age-appropriateness of
an infant or child’s development.

15.3.1 Why do we assess development?


Long before scientific studies of children were carried out, it was an
accepted fact that the early years are critical in the child’s
development. This was expressed in the old Chinese proverb, ‘As the
twig is bent, so the tree’s inclined’. In a more poetic way, Milton
expressed the same fact when he wrote, ‘childhood shows the man,
as morning shows the day’ (Hurlock, 1978, p. 25). The growth and
development that occurs during these early years of childhood has a
large impact on later development (Luiz, 1997). Since the early 1970s
it has been internationally recognised that the identification of children
with difficulties should take place as early as possible. These
difficulties cover a vast spectrum and range, from difficulties of
movement to difficulties of speech. Thus, the rationale for assessing a
child’s development at an early age is simple: the sooner a child’s
difficulties can be identified, the sooner an intervention can be
implemented and, hence, the sooner a child can be assisted. To
ensure that children with developmental delays are provided with high-
quality, holistic care and services at an early stage, undertaking
developmental assessment is critical. The last decade has witnessed
an increase in the number and use of standardised developmental
assessment measures in healthcare and paediatric settings (Radecki,
Sand-Loud, O’Connor, Sharp & Olson, 2011). According to Moodie,
Daneri, Goldhagen, Halie, Green, and LaMonte (2014), developmental
screening ‘provides a quick snapshot of a child’s health and
developmental status and indicates whether further evaluation is
needed to identify potential difficulties that might necessitate
interventions or special education services’ (p. 2). Such further
evaluation often requires that the child is assessed using a more
comprehensive diagnostic measure to identify the nature and severity
of developmental delays, generally or in specific areas, and the
developmental strengths the child has. Consequently, as the use of
standardised developmental screening continues to grow, so too will
the need for comprehensive diagnostic assessment using measures
such as the Griffiths III (described in Chapter 10, Section 10.4.1).
Considering South African history and the current context, there
are many children who do not have access to adequate resources and
learning opportunities, and thus fall within the disadvantaged bracket.
Without some form of assessment, these children may go unnoticed
and their developmental difficulties will only be detected when they fail
to master the developmental milestones. It may prove more
challenging to intervene in the older child’s life and sadly the window
of opportunity may be missed. Therefore, the goal of early childhood
assessment within the South African context should be to ‘acquire
information and understanding that will facilitate the child’s
development and functional abilities within the family and community’
(Shonkoff and Meisels, 2000, p. 232). Greenspan and Meisels (1996)
define developmental assessment as ‘a process designed to deepen
the understanding of a child’s competencies and resources, and of the
care-giving and learning environments most likely to help a child make
fullest use of his or her developmental potential’ (p. 11).

15.3.2 What functions do we assess?


Brooks-Gunn (1990), in her book Improving the Life Chances of
Children at Risk, stressed that the measurement of the well-being of a
child includes the assessment of physical, cognitive, social, and
emotional development. Thus, a comprehensive developmental
assessment should include these four aspects of functioning, which
are not mutually exclusive. A problem in one area may have an effect
on another area. For example, a socially and emotionally deprived
child may present with delayed cognitive development as a result of
the lack of social and emotional stimulation. However, if this problem
is detected at an early age and the child is provided with intensive
stimulation, the cognitive deficit may disappear after a period of time.
The holistic view of child development cannot be emphasised enough,
and by implication a holistic perspective to the developmental
assessment of South African children is vital in view of the poor social
conditions the majority of South African children are experiencing
(Luiz, 1994).

15.3.3 How do we assess?


Developmental measures can be categorised as either screening or
diagnostic measures. Developmental screening involves a short,
structured evaluation of aspects of development, which attempts to
identify children who may be potentially at risk for developmental
difficulties (Squires, Nickel, and Eisert, 1996). Some screening
measures can be administered by non-specialists (e.g. parents,
educators, clinic nurses) who have been trained to use them. In
comparison to screening measures, diagnostic measures can be
described as in-depth, comprehensive, individual, holistic measures
used by trained professionals. The aim of diagnostic measures is to
identify the existence, nature, and severity of the problem.
As with diagnostic measures, screening measures often assess a
child holistically. Typically, they examine areas such as fine- and
gross-motor coordination, memory of visual sequences, verbal
expression, language comprehension, social-emotional status, and so
on. Screening measures are cost-effective as children can be
effectively assessed in a short space of time. The results from
screening measures are generally qualitative in nature, in that they
categorise the child’s performance, rather than provide a numerical
score (Foxcroft, 1997a). Furthermore, screening measures usually
provide an overall view of the child’s development, rather than
information relating to specific areas. Comprehensive diagnostic
measures, on the other hand, provide numerical scores and/or age
equivalents for overall performance as well as for each specific area
assessed. As both types of measures have advantages and
disadvantages, any comprehensive plan of developmental
assessment should incorporate both screening and diagnostic
instruments (Brooks-Gunn, 1990).

15.3.4 What developmental measures are available


for South African children?
What is the state of developmental assessment in South Africa?
During the last decade or so, researchers have made a concerted
effort to address the need for more reliable and valid developmental
measures for South African children. The focus of this research has
been twofold, namely the construction of new culture-reduced tests
(e.g. Grover-Counter Scale (HSRC, 1994)) and adapting, refining, and
norming appropriate tests that have been constructed and proven to
be valid and reliable in other countries (e.g. the Kaufman Assessment
Battery for Children, and the Bayley, McCarthy, and Griffiths Scales).
We will briefly describe the developmental measures currently in
frequent use with South African children. It must, however, be
emphasised that the authors do not claim that these are the only
measures in use.
15.3.4.1 Global developmental screening measures
Denver II
The Denver Developmental Screening Test (DDST) was first
published in 1967 and revised in 1990 as the Denver II (Frankenburg
& Dodds, 1990; Frankenburg et al., 1990). The Denver II can be
administered to children from 0 to 72 months (six years). The domains
tapped are personal-social development, language development,
gross- and fine-motor development. The results obtained from the
Denver II categorise the child’s current development into one of three
categories, namely, Abnormal, Questionable or Normal development
(Nuttall, Romero & Kalesnik, 1992). The Denver II was developed in
the US and, although research on the applicability of the scales is
underway in South Africa (Luiz, Foxcroft & Tukulu, 2004), there are,
as yet, no norms available for South African children.
Vineland Adaptive Behaviour Scales, Third Edition (Vineland-3)
The Vineland-3 (Sparrow, Cicchetti & Saulnier, 2016) provides
information that assists in the identification and diagnosis of cognitive
and developmental delays. The scales assess personal and social
competence of individuals from birth to adulthood. The scales
measure adaptive behaviour in the following core domains:
Communication, Daily living skills, and Socialisation as well as in the
following two domains that are optional to include: Motor skills and
Maladaptive behavior (Sparrow, Cicchetti & Saulnier, 2016). They do
not require the direct administration of tasks to an individual, but
instead require a respondent who is familiar with the individual’s
abilities and general behaviour (e.g., caregivers, teachers) to
participate in a structured interview and/or complete structured forms.
This makes the scales attractive to use with individuals with special
needs (e.g. a child with hearing or motor difficulties).
Goodenough-Harris Drawing Test or Draw-A-Person (DAP) Test
This test (Harris, 1963) is utilised with children between the ages of 5
and 16 years. It involves the drawing of a person, which is scored
against 73 characteristics specified in the manual. Alternatively,
drawings can be compared with 12 ranked drawings (Quality Scale
Cards) for each of the two scales. The results provide a non-verbal
measure of mental ability (Nuttall, Romero & Kalesnik, 1992).
The Draw-A-Person Intellectual Ability Test for Children,
Adolescents, and Adults (DAP:IQ) by Reynolds and Hickman (2004)
allows for the estimation of cognitive ability of an individual. This is
achieved through the provision of a common set of scoring criteria
against which intellectual ability can be estimated from a human figure
drawing (HFD).
Mpofu (2002) reports on how the Draw-A-Person test was adapted
in Zambia to make it more Africa-centred. Not all preschool children in
rural African villages have been exposed to using a pencil or crayon to
draw something. So instead of using a pencil and paper, children are
given wet clay and asked to form a person from the clay. The resulting
figure is scored using set criteria. The inter-rater reliability is high (.89)
(Kathuria & Serpell, 1999). The test is fittingly named the Panga
Munthu Test (Make-A-Person Test). Performance on the Panga
Munthu Test was significantly better than performance on the paper-
and-pencil DAP version (Mpofu, 2002). This is a good example of the
importance of using media, stimuli, and so on for test tasks that are
familiar and contextually relevant for a test-taker.
Other tests
Other internationally used measures include the following: Battelle
Developmental Inventory Screening Test (BDIST) (Newborg, Stock,
Wnek, Guidubaldi & Svinicki, 1984); Developmental Profile II (DP-II)
(Alpern, Boll & Shearer, 1980); and the Preschool Development
Inventory (PDI) (Ireton, 1988). However, research relating to their
applicability, reliability, and validity with South African children is
lacking. Therefore, the results obtained by South African children on
these scales should be interpreted with caution.
15.3.4.2 Educationally focused screening measures
It is important to assess the extent to which young children are
developing the necessary underpinning skills that are the essential
building blocks for being able to successfully progress in the early
school years. Research has shown that the sooner children at risk for
scholastic difficulties are identified and the earlier that appropriate
intervention and educational instruction are instituted, the better the
chances are that these children will develop to their full potential
(Nuttall, Romero & Kalesnik, 1992).
Educationally focused screening measures tend to tap
important underpinning skills related to school learning tasks that are
predictive of early school success. These skills include the following:
cognitive skills, language, motor skills, copying shapes, concept
development, memory, perceptual processes, lateral preference, and
lateral (left-right) discrimination. If you compare these skills with the
ones tapped by the measures that we considered in Section 15.3.4.1,
you will notice that there is a considerable overlap in the skills
assessed by more educationally focused screening measures and
measures designed to assess global developmental progress and
delays (Gredler, 1997). In this section, we will look at some of the
measures that are used to assess whether a preschool or Grade 1
child is at risk for developing scholastic difficulties. The spotlight will
mainly fall on measures that have been developed in South Africa.
School-readiness Evaluation by Trained Testers (SETT)
The SETT (Joubert, 1988) is an individual measure that aims to
determine whether a child is ready for school or not. The SETT is
administered in such a way that a learning situation, similar to that in
the classroom, is created. The SETT provides an evaluation of a
child’s developmental level in the following areas: language and
general (or intellectual) development, physical and motor
development, and emotional and social development.
Aptitude Test for School Beginners (ASB)
The ASB (HSRC, 1974) can be administered on a group or individual
basis. It aims to obtain a comprehensive picture of specific aptitudes
of the school beginner and evaluates the cognitive aspects of school
readiness. The ASB comprises eight tests: Perception, Spatial,
Reasoning, Numerical, Gestalt, Coordination, Memory, and Verbal
comprehension. The instructions are available in the seven most
commonly spoken African languages and norms are available for a
variety of different groups of children.
Other tests
Other batteries designed elsewhere in the world that are used in
South Africa include Developmental Indicators for Assessment of
Learning – Third Edition (DIAL-3) (Mardell-Czudnowski & Goldenberg,
1990) and Screening Children for Related Early Educational Needs
(SCREEN) (Hresko, Hammill, Herbert & Baroody, 1988).
While the measures discussed above are largely standardised
assessment batteries, a number of individual measures can be
grouped together to form a battery that assesses school readiness
abilities. Individual measures often used include the following: Revised
Motor Activity Scale (MAS) (in Roth, McCaul & Barnes, 1993);
Peabody Picture Vocabulary Test – Third Edition (PPVT-III) (Dunn &
Dunn, 1981); Developmental Test of Visual-Motor Integration – Fifth
Edition (VMI-V) (Beery, Buktenica & Beery, 2005); Teacher
Temperament Questionnaire (TTQ); the Children’s Auditory
Performance Scale (CHAPS) (Smoski, Brunt & Tannahill, 1998;
Wessels, 2011); Stress Response Scale (SRS); and the Personality
Inventory for Children – Second Edition (PIC-2).
15.3.4.3 Diagnostic measures
In this section we will focus on measures that provide a more
comprehensive assessment of development, which enables the
results to be used for diagnostic purposes. We will focus on
international measures that have been adapted for use in South
Africa. Among these measures are the Griffiths Scales of Child
Development – Third Edition (Griffiths III), which was described in
Chapter 10 (Section 10.4.1.1) and will thus not be repeated here. Two
other developmental measures not discussed in Chapter 10 are the
McCarthy Scales of Mental Abilities and the Bayley Scales of Infant
Development.
McCarthy Scales of Mental Abilities (McCarthy Scales)
The McCarthy Scales are used for children between the ages of 3
years and six months and 8 years and six months. There are 18
separate tests, which are grouped into five scales: Verbal, Perceptual
performance, Quantitative, Motor, and Memory. When the Verbal,
Quantitative and Perceptual performance scales are combined, an
overall measure of the child’s intellectual functioning is obtained.
Bayley Scales of Infant Development – Second Edition (BSID-II
or Bayley III)
The Bayley III (Bayley, 1969) is used to identify children between the
ages of 1 to 42 months (3 years and six months) with developmental
delays or those suspected of being ‘at risk’. The Bayley II is divided
into three scales: the Mental Scale, the Motor Scale, and the
Behaviour Rating Scale. The use of all three scales provides a
comprehensive assessment. Norms for interpreting the performance
of black South African children on the original Bayley Scales published
in 1969 are available (Richter & Griesel, 1988). However, no South
African norms are available for the revised scales.
15.3.4.4 Shortcomings of developmental assessment
Among the shortcomings of developmental assessment in South
Africa identified by Luiz (1994) are that:
specific measures are standardised for specific cultural, socio-
economic, and language groups to the exclusion of others and there
are limited standardised measures that assess the development of
black preschool children
specific measures are standardised for specific age groups to the
exclusion of others
due to the specificity of measures regarding age ranges and cultural
groups, related research is fragmentary in nature, resulting in limited
generalisability of research findings
• measures do not always provide a holistic picture of development
as socio-emotional functioning is not always assessed.
A key criterion for assessment in the twenty-first century is that it must
be authentic. The implications for developmental assessment are that
assessment tasks should be situated in, and draw on, activities of
daily living and use materials that are familiar to the child (Kim &
Smith, 2010).The need to adapt, refine, and norm appropriate tests
that have been constructed and proven valid and reliable in other
countries for use in the South African context has been emphasised
and identified as a priority. Through adapting assessment measures
such as the Wechsler Intelligence Scale for Children – Fifth Edition
(WISC-V) and the Wechsler Pre-School and Primary Scale of
Intelligence – Fourth Edition (WPPSI-IV), other diagnostic
developmental measures used widely in the international arena
become more accessible to the South African context. Concurrently,
there is a need to also develop more indigenous measures that tap
more culture-specific aspects of child development and which use
contextually relevant assessment tasks.
15.3.4.5 Information from other sources and observations
Everyone involved with the care and education of children will have
observed mental development in the practicalities of day-to-day
situations. Such observations combined with established common
knowledge are the basis for our everyday understanding of where a
particular child is in his or her development (Preston, 2005). Other
than objective psychological assessment information, valuable
developmental information can be obtained from a variety of sources
(e.g. the parent(s)/caregiver, crèche-mother, preschool teacher).
Indeed, the use of multidisciplinary teams to assess development is
spreading (Preston, 2005).
Abubakar, Holding, Van de Vijver, Bomu, and Van Baar (2009)
argue that the absence of sufficient professional assessment
practitioners and psychometrically sound assessment measures,
‘restricts the ability to implement early intervention monitoring
programmes in developing countries’ (p. 291). They argue further that
given the lack of assessment resources in sub-Saharan Africa, there
is a ‘need for approaches to identify at-risk children that do not require
a high level of training, are cost-effective, and acceptable within the
communities involved’ (p. 291). These views led them to develop a
structured interview that became known as the Developmental
Milestones Checklist. The checklist is completed by interviewing
caregivers and it is scored. Psychometrically, the checklist
demonstrated good levels of validity and reliability. Consequently,
Abubakar et al. (2009) concluded that using an inexpensive resource,
such as a structured interview that enables caregivers to report on a
child’s development, ‘is a viable method to identify and monitor at-risk
children in sub-Saharan Africa’ (p. 291).
Naturalistic observation of a child’s play can also provide the
assessment practitioner with valuable information as to whether the
child is capable of cooperative play, whether the child can initiate
activities on his or her own, and so on. The emphasis is on the
observation of the whole child within the child’s total environment at a
point in time (Preston, 2005). This information can be integrated with
the assessment results and the observations of the assessment
practitioner to form a holistic picture of the young child’s development.
There is a close link between assessment and therapy.
Assessment results inform therapeutic interventions, which, in turn,
deepens the psychological practitioner’s understanding of the child’s
level of functioning, behaviour, emotional state, and so on (Fritz,
2016). In the process, activities undertaken in therapy such as the use
of creative expressive arts (painting, using clay, music, dance and
movement, using sand-trays and sand play) or using the Mmogo-
method® (Roos, 2008, 2012) can take on the form of an assessment
technique. This provides opportunities for the psychology professional
to observe a child’s behaviour and level of functioning, especially with
children who have experienced trauma.

ADVANCED READING
These readings provide you with African-centred assessment
tasks and more information about the use of creative expressive
arts in therapy, and for assessment and observation purposes.
Fritz, E. 2016. Flowing between assessment and therapy through
creative expressive arts. In R. Ferreira (Ed.), Psychological
assessment: Thinking innovatively in contexts of diversity (pp.
103−209). Cape Town: Juta and Company.
Knoetze, J. 2013. Sandworlds, storymaking, and letter writing: The therapeutic
sandstory method. South African Journal of Psychology, 43, pp. 459−469.
Mpofu, E. 2002. Indigenization of the psychology of human intelligence
in sub-Saharan Africa. In W.J. Lonner, D.L. Dinnel, S.A. Hayes &
D.N. Sattler (Eds.), Online Readings in Psychology and Culture (Unit
5, Chapter 2). Center for Cross-Cultural Research, Western
Washington University, Bellingham, Washington USA.
Richards, S.D., Pillay, J. & Fritz, E. 2012. The use of sand-tray techniques
by school counsellors to assist children with emotional and behavioural
problems. The Arts in Psychotherapy, 39, pp. 367−373.
Roos, V. 2008. The Mmogo-method: Discovering symbolic
community interactions. Journal of Psychology in Africa, 18(4), pp.
659−668.
Roos, V. 2012. The Mmogo-method: An exploration of experiences through
visual projections. Qualitative Research in Psychology, 9(3), pp. 249−261.

❱❱ CRITICAL THINKING CHALLENGE 15.3

Given the interesting, potentially African-centred activities emerging


in therapeutic contexts in South Africa, what process needs to be
followed to transform them from opportunities for observing
behaviour to psychological assessment measures?
[Hint: look at Chapters 1 and 8 to gain insight into what makes an activity
a ‘psychological measure’. Also look at Chapters 6 and 7, which focus
on the development and adaptation of psychological measures.]

❱❱ ETHICAL DILEMMA CASE STUDY 15.1

Faith, who is 5 years old (i.e. 60 months), was assessed on the Griffiths III after
the psychologist obtained informed consent from Jenny, Faith’s caregiver. Faith’s
parents both work full time, so Jenny took her to the assessment.
The assessment was started in English, but when it became
apparent early on that Faith found tasks challenging, the language
of assessment was switched to Afrikaans to deliver instructions.
The tester, who was a psychologist, was sufficiently fluent in both
English and Afrikaans to ensure that translations were appropriate.
Faith appeared quite anxious throughout the assessment and
continually sought reassurance from the tester administering the
items whenever she was unsure. Her weak grip strength was noted
and appeared to affect her hand and eye coordination the most.

Figure 15.1 Profile of Faith’s performance on the Griffiths Scales

Faith was assessed on all five subscales of the Griffiths III, namely: Foundations of
Learning (Subscale A); Language and Communication (Subscale B); Eye and Hand
Coordination (Subscale C); Personal, Social, and Emotional development (Subscale D);
and Gross Motor Skills (Subscale E), and her general development was also calculated.
On Subscales A, B, C, and D, Faith performed at a developmental
age of 3 years, which is two years below her chronological age of 5 years. Her
performance on these four subscales can be described as extremely low when
compared to other children of the same age. Faith’s best performance was on
Subscale E. She performed at a developmental age of 4 years on this
subscale, which is one year below her chronological age. Her performance
can be described as borderline in comparison to children of the same age.
Overall, Faith obtained a general developmental age of 3 years,
which is one year below her chronological age. Her general
developmental functioning can therefore be described as extremely
low in comparison to other children of the same age.
When interpreting the two profiles on the left (which reflect the
Scaled Scores and Developmental Quotients that Faith achieved),
the dotted line indicates average functioning, while a score below
the dotted line indicates below-average functioning.

Questions
(a) Did the psychologist employ good, ethical assessment
practices? Motivate your answer.
(b) Were there any ethical dilemmas? How did the psychologist
negotiate the dilemma(s)?
(c) Why do you think the psychologist used a diagnostic, as
opposed to a screening measure?
(d) Summarise Faith’s level of development and indicate what her
relative strengths and weaknesses are.

The role of psychological


15.4
assessment in education
15.4.1 The uses of psychological measures in
educational settings
15.4.1.1 Uses for school-age learners
Much has been written about the introduction of outcomes-based
education (OBE) in South Africa. In OBE, among other things,
educators need to tailor their instruction to meet the unique learning
needs of each learner. This implies that the teacher needs to know
what the learners’ strengths and weaknesses are and their level of
functioning at the start of a grade, or new education phase, or learning
programme. You might think that teachers would only be interested in
knowing the status of development of a learner’s academic
competencies (e.g. how many words a child is able to recognise,
whether the child can perform basic arithmetic computations, and so
on). In fact, knowing something about a child’s overall cognitive,
personal, and social development, the development of critical skills
(e.g. the ability to listen attentively or to work in a group), as well as
certain attitudes and behaviours (e.g. an eagerness to learn, attitude
towards studying, and study habits), is equally important. It is
particularly in this regard that psychological measures can be used
along with measures of academic achievement to identify learners’
strengths and weaknesses. The information gained about learners
from such an assessment will assist educators in planning their
instructions more effectively and referring learners with special needs
for appropriate intervention and help.
We will use a practical example to illustrate some of the uses of
psychological measures in educational settings as outlined above, as
well as to alert you to some ethical considerations. Foxcroft and
Shillington (1996) reported on a study in which they were approached
by a group of Grade 1 teachers to assist them in identifying children
with special-education needs prior to the start of Grade 1, so that the
educational input could be tailored to the learners’ needs. As a large
number of children had to be assessed, a measure that could be
administered in a group context was necessary. Furthermore, as the
children were from diverse linguistic backgrounds, the measure
needed to be administered in their home language. Many of the
children came from disadvantaged backgrounds and had not had the
opportunity to participate in formal early childhood development
programmes. Consequently, to ensure that there was no
discrimination against these children, the selected measure had to
provide the children with an opportunity to practise test tasks before
they were actually assessed on them. For the assessment to provide
useful information, Foxcroft and Shillington (1996) also had to ensure
that appropriate aspects of the children’s functioning were assessed.
They therefore chose to use the School-entry Group Screening
Measure (SGSM) as it assesses underpinning abilities that are critical
for academic progress in the early grades, namely visual-motor ability,
reasoning, attention and memory, and visual-spatial ability.
The SGSM is, furthermore, a group test, the administration
instructions are available in different South African languages, and
each subtest starts with practice items. The reason for assessing the
children before the start of Grade 1 was explained to the parents and
informed consent was obtained from them. Parents were very
positive about the fact the teachers were going to such efforts to
identify whether their children had specific needs that required specific
intervention. Based on the SGSM results, observations made by the
teachers about the children’s test behaviour, and information obtained
from an interview with the parents (conducted by the teachers),
children who were at risk academically were identified. The
instructional programme for these high-risk children was tailored to
enrich their development specifically in the areas in which they had
difficulty. When all the Grade 1 learners were re-assessed on the
SGSM in June of the Grade 1 year, what do you think was found?
The developmental gains made by high-risk children were almost
twice as great as those made by their low-risk peers. Furthermore, by
the end of the year, 68 per cent of the high-risk children were found to
be ready to tackle Grade 2. Thus, although these children entered
Grade 1 at a developmentally disadvantaged level in relation to their
low-risk peers, the instruction that they received helped them not only
to catch up developmentally, but also to cope successfully with the
academic demands of Grade 1. The remaining 32 per cent of the high-
risk children who were not able to successfully master the academic
demands of Grade 1 were referred to a psychologist for more
comprehensive, psychodiagnostic assessment (see Section 15.5 to
discover more about diagnostic assessment). Some of these children
were found to have specific learning problems that required remedial
intervention, while others were found to be cognitively challenged and
thus needed to be placed in a special school where the academic
demands were less intense. A few of the high-risk children were found
to have behavioural disorders (e.g. Attention Deficit Hyperactivity
Disorder). These children required therapeutic intervention from a
psychologist and referral to a paediatrician for possible prescription of
medication.
The above example illustrates a number of important aspects of
the use of psychological assessment in educational contexts. Before
you read any further, go through the material again and try to list what
you have learned about psychological assessment.
Using the example cited above, some of the critical aspects of the
use of psychological assessment in educational contexts are that:
it is important to clearly match what needs to be assessed with
educational or curricula objectives and learning outcomes. This
implies that psychological measures that are widely used in
educational settings should be researched in terms of their predictive
validity and might have to be adapted so that they are aligned more
closely with the learning outcomes of the various grades
care needs to be taken to cater for the linguistic and other special
needs of learners during the assessment so that learners are not
unfairly disadvantaged by the assessment
it is important that assessment practitioners and educators inform
parents regarding the purpose of the assessment and obtain their
informed consent
the results of an appropriately performed psychological assessment
can be of invaluable benefit to the teacher to assist him or her in
planning how best to facilitate learning and development in his or her
learners
psychologists and teachers can work together to gather assessment
data that can inform appropriate educational intervention and
programme development
assessment assists in identifying learners with special educational
needs and talents who may need to be given specialised educational,
accelerated, or academic development programmes and who might
benefit from psychological intervention. Children with special
education needs include those with learning problems, those who are
mentally challenged, those who are physically challenged, those who
are gifted (highly talented), and those who have behavioural disorders
that make it difficult for them to progress scholastically.
15.4.1.2 Uses for educational accountability purposes
Achievement tests are often used to provide feedback on what school
learners know and can do at various grade levels. The achievement
test information is often aggregated for schools, school districts,
provinces, and countries. Educational officials then use this
information ‘to hold teachers and others “accountable” for students’
performance on the test’ (Sireci & Gandara, 2016, p. 190).
Large-scale international comparative tests, such as the Trends in
International Mathematics and Science Study (TIMSS) and the
Progressive International Reading Literacy Study (PIRLS), are
administered to school learners around the world, including South
Africa. Countries are then ranked according to the results.
Dr Vijay Reddy of the HSRC and Professor Sarah Howie, formerly
of the Centre for Evaluation and Assessment at the University of
Pretoria, have been the national coordinators of the TIMSS and PIRLS
studies in South Africa. Their work has been highly acclaimed
nationally and internationally. They are part of a smallish pool of
educational and psychological measurement experts in the country. Of
concern is the fact that while South African measurement experts
make significant contributions to international projects, their expertise
is not utilised for national assessment projects. For example, the
Department of Basic Education runs the Annual National
Assessments (ANA). The test papers for ANA are set by two teachers
and two moderators (who are specialist subject advisers) per subject.
In 2012, the Grade 9 national mathematics average was 13 per cent.
Not only was there a public outcry about this, but when the
mathematics test was scrutinised, errors were found, which made it
impossible to answer some of the questions, and the general
credibility of the test was questioned as a result (reported in the
Sunday Times, 9 December 2012, p. 14). Contrast this with the South
African 2011 TIMSS results that were released a week after the ANA
results. The average percentage obtained by Grade 9 learners in the
TIMSS study was 35 per cent. While this is still a shocking result, it is
significantly higher and more realistic in terms of previous research
findings than the 13 per cent average obtained in the ANA.
Throughout this book, reference has been made to the care that
needs to be taken when developing measures to ensure that they are
valid and reliable so that appropriate inferences can be made from the
findings. It would appear that the ANA could benefit from the
measurement and psychometric rigour that is put into an international
project such as the TIMSS. Measurement experts should be called
upon to assist with the ANA or other national tests such as the Grade
12 examinations. If not, the credibility and the accuracy of the
conclusions drawn from the results will continue to be questioned.
Many international comparative studies suggest that the
development of South African learners in key literacies and subjects
falls below that of learners in other countries. Rather than to merely
blame educators and the educational system for this, emphasis should
be placed on how to turn these results around. Look back at the study
reported on in 15.4.1.1. Based on the TIMMS and PIRLS results,
educational intervention and development activities could be devised
to enhance the literacy and mathematics competencies of South
African learners. The implementation of these interventions needs to
be appropriately resourced.

15.4.2 Types of measures used in educational settings


15.4.2.1 Achievement measures
These can be defined as measures that tap the effects of relatively
standardised instructional input. Achievement measures are often
summative or provide a final/terminal evaluation of a learner’s status
on completion of a programme or grade. Evaluations administered at
the end of a specific course or time span are examples of
achievement measures.
Some achievement measures have been developed for certain
learning domains, such as reading and mathematics for various grade
levels, and are usually normed across the entire country. However,
this is not currently the case with achievement measures in South
Africa. This is due to the fact that during the apartheid years, there
were many different education systems and curricula, and during the
past two decades national curricula have been revised; the current
guiding policy is the Curriculum Assessment Policy Statements
(CAPS) released in 2011. Consequently, there are currently very few
standardised achievement measures available in South Africa that are
appropriately aligned with the new curriculum. Furthermore, there are
no standardised achievement measures available that have been
appropriately normed for all South Africans. Criterion-referenced
achievement tests are gradually becoming more popular than norm-
referenced tests. This is due to the fact that criterion-referenced
achievement tests allow for a learner’s performance to be pegged on
a continuum of learning outcomes of increasing complexity. This
provides a rich description of what a learner knows and is able to do
and indicates which aspects need to be developed in order to reach
the next step of the continuum. You can consult Foxcroft (2005) for a
more complete discussion on criterion-referenced achievement tests.
Teacher-developed measures are used more commonly in South
Africa, as teachers need to find a means of validly assessing learning
performance in the classroom. Ideally, the teacher would select a
variety of assessment measures or tasks to best evaluate the
learners’ ability or proficiency. The choice of measure should be
based on the nature of the information that the teacher wants to
obtain, or the purpose of the assessment. For example, if the teacher
wants to assess knowledge acquisition, it would be better to use a
multiple-choice-type question, which taps that specific knowledge
domain. However, if the teacher needs to discover the levels of
analysis, integration, critical thinking capacity, and the ability of the
learner to synthesise diverse information, it would be more appropriate
to use an essay-type question.
An additional form of achievement assessment is that of portfolio
assessment. This is a collection or portfolio of examples of a learner’s
work over a period of time and allows for documentation of the
learner’s work. The decision regarding which material should be
included in the portfolio is often a collaborative effort between the
learner and teacher, who together gather examples of the learner’s
work. This has the additional advantage of allowing the learner the
opportunity for self-assessment.
15.4.2.2 Aptitude measures
Aptitude measures are used to assess and predict the specific ability
that a person has to attain success in an area where that specific
ability is required (see Chapter 10). For example, if an aptitude
assessment reveals that a person has an aptitude for art, the
assumption is that he or she has many of the characteristics and
capabilities to become a successful artist. There are different types of
aptitude measures. For example, a numerical aptitude test may
predict the learner’s future performance in mathematics, while a test of
two- or three-dimensional perception may assist in the selection of a
learner for an architecture course.
15.4.2.3 General and specific cognitive measures
Measures of general intelligence (see Chapter 10 for examples) are
used in educational settings for a number of reasons. A child’s
intelligence quotient or IQ serves as a guideline regarding the child’s
accumulated learning in a variety of learning environments. The IQ
scores obtained will indicate the child’s intellectual functioning, which
may, in turn, assist the psychologist or teacher to make decisions
regarding the child’s potential. Individual analysis and interpretation of
subtest scores, called a scatter analysis, provide indications regarding
a child’s vocabulary, comprehension, short- and long-term memory for
both words and numbers, problem-solving ability, and visual-spatial
ability. This will assist the psychologist to provide guidance to the
child’s teacher about possible remediation. A difference in scores on
verbal and performance abilities may indicate specific learning
problems, while the manner in which the child has approached the test
situation as well as the level of concentration and attention span
shown by the child provide valuable qualitative information.
Measures of cognitive ability should never be administered in
isolation, and interpretation of the results should take the entire child
into account. This means that the child’s emotional state, family
background, physical health, or specific trauma will all have some
influence on test scores. The use of a single test score to make
decisions regarding the child should be seen as an unethical
assessment practice.
Measures of specific cognitive functions refer to such functions
as visual and auditory sequential memory or specific learning tasks
(see Chapter 10). The selection of specific cognitive measures is
usually dictated by the specific problems experienced by the child in
the classroom. A child who is experiencing problems, for example with
phonics, will be assessed on specific language-related measures so
that the nature of the problem can be diagnosed. The accurate
diagnosis of a problem enables appropriate remedial intervention to
be instituted.
15.4.2.4 Personality-related measures
Personality-related measures include inter alia personality
measures and measures of temperament, motivation, task orientation,
and interpersonal styles. Chapters 11, 12, and 13 provided you with
information about these types of measures. Here, we will focus on the
use of such measures in educational contexts.
Different personality types and different temperaments may impact
on academic performance. By assessing personality and
temperament, the teacher can be assisted in gaining insight into the
child’s personality characteristics and how best to handle the child.
For example, the child whose natural temperament is relaxed and
confident will cope better with the demands of the classroom than one
who is naturally more anxious and fearful. If both children have a
specific learning problem, the confident child may respond well to
remediation, while the more anxious child may withdraw into further
emotional problems. The teacher would need to tailor her teaching to
suit the nature of the child by offering more challenges to the first and
more nurturing to the second.
15.4.2.5 Dynamic assessment
Dynamic or guided assessment is adaptable and is closely aligned
with the approaches used in tests of potential where learning
opportunities are provided as part of the testing process (see
Chapters 10 and 17). The potential of a learner is assessed by
observing how well he or she can learn in a situation where the
assessment practitioner is the assessor, instructor, and clinician
(Anastasi & Urbina, 1997). In dynamic assessment, there is a
deliberate departure from standardised assessment procedures so as
to elicit additional qualitative information about the learner. Murphy
and Maree (2006) argue that conventional assessment is a product-
based approach while dynamic assessment is a process-oriented
approach that focuses on the learning that takes place during an
assessment session. As dynamic assessment focuses on future
potential rather than current ability, an advantage of this assessment
approach is that it can overcome some of the cultural and educational
biases associated with assessment outcomes of conventional tests
(Murphy & Maree, 2006).
Dynamic assessment is often employed as a useful adjunct during
remedial intervention. The learner may be assessed before the
intervention (pre-test) to establish a baseline functioning and to
provide a clear idea of what learning experiences should be facilitated.
After guided instruction, an assessment (post-test) will reveal the
nature of the learning outcomes achieved (and the gain in scores from
pre- to post-test) and will indicate those which still need attention.
Further instruction followed by a re-assessment will guide further
interventions until the goals have been attained (this is often referred
to as the test-teach-test approach). Readers should consult Murphy
and Maree (2006) for an overview of research in the field of dynamic
assessment in South Africa.

❱❱ CRITICAL THINKING CHALLENGE 15.4

You have been asked to address teachers at a primary school about the use
of psychological assessment in schools. Prepare the talk that you will give.
15.4.3 Uses of tests in higher-education contexts
Entrance to universities and universities of technology is usually
dependent on meeting certain entrance requirements. Traditionally,
matriculation results have been used when determining admission to
higher education programmes. However, there is a growing body of
research that points to the unreliability of matriculation results in
predicting academic success in higher education, especially for
learners from disadvantaged educational backgrounds (e.g. Foxcroft,
2004a; Skuy, Zolezzi, Mentis, Fridjhon & Cockroft 1996). Furthermore,
there is a need to broaden access to higher education, especially for
historically disadvantaged South Africans. In addition, national and
international research findings suggest that the combination of
entrance assessment results with high-school performance leads to
significantly better predictions of academic performance (Anastasi &
Urbina, 1997; Foxcroft, 2004a, 2005; Griesel, 2001; McClaran, 2003).
Consequently, as happens elsewhere in the world, a number of
universities and universities of technology in South Africa have
instituted assessment programmes to supplement matriculation
information. Such an assessment needs to tap generic or basic
academic competencies such as verbal, numerical, or reasoning
abilities, which are seen as the foundation for success in higher
education programmes. In addition to these generic competencies,
subject-specific content (e.g. chemistry or accountancy) may also be
assessed. However, it is not sufficient to simply tap academic
competencies, as research has indicated that there are personal
attributes and behaviours that are equally important for academic
success. Fields of interest, motivation, study skills, having career
goals, academic self-belief, and leadership ability have inter alia been
found to be related to academic success.
Apart from assisting in admissions decisions, another advantage of
assessing learners on entry to higher-education programmes is that
the resultant information can be used to guide programme planning
and the development of student support initiatives. Furthermore,
assessment results can be used to counsel students regarding the
most appropriate programme placement for them as well as specific
aspects of their functioning that require more development.
Assessment at entry to higher education thus has the potential to go
beyond simply trying to select learners with the potential to succeed. It
can, in fact, serve a more developmental purpose, which ultimately
benefits the learner.
Psychological assessment measures are also used to assess
individual and groups of students during their university studies to
assess barriers to learning (e.g. the Learning Enhancement
Checklist), learning styles and strategies, well-being and adjustment
(e.g. Sommer & Dumont, 2011; Van Lingen, Douman & Wannenburg,
2011), career development, and so on. This assessment information is
used to plan interventions to enhance academic performance and
personal development and well-being.
A further application of psychological assessment in higher
education is to aid in learning design and materials development. For
example, based on the established relationship between spatial
perception and academic success in science and engineering
programmes, Potter, et al. (2009) developed and implemented ‘high-
imagery course material for engineering students’ (p. 109). Significant
gains in three-dimensional spatial perception were found, which
impacted significantly on academic success.

15.5 Psychodiagnostic assessment


15.5.1 What is psychodiagnostic assessment?
Consider the following case scenarios:
A person commits a murder but claims that he is innocent as ‘a voice
in his head’ told him to do this.
An 8-year-old child just cannot sit still in class and is constantly
fidgeting. She constantly daydreams and cannot focus her attention
on her schoolwork. She disrupts the other children, makes insensitive
remarks, has aggressive outbursts, and has very few friends as a
result. Her teacher wonders if she lacks discipline and structure at
home, or whether she has a more serious behavioural disorder.
A Grade 2 pupil is not making sufficient progress at school, despite
the fact that he seems to be intelligent. The teacher needs to know
whether this child has a specific learning problem that will require
remedial intervention, or whether the child needs to be placed in a
special class or school.
A 60-year-old man has a stroke and the neurologist and family want to
know which cognitive functions are still intact, and how best to plan a
rehabilitation programme that will address his cognitive and emotional
well-being.
During a custody hearing, information needs to be provided in order to
decide which parent should be granted legal custody of a minor child.
The court needs guidance regarding which parent would be the better
option.
• Following a motor-vehicle accident in which an woman sustained a
serious head injury, the family wishes to submit an insurance claim
as they argue that the individual has changed completely since the
accident and will never be able to independently care for herself
again. The insurance company needs to know whether these
claims are valid before they will settle the claim.
For each of the above scenarios, psychodiagnostic assessment
can provide valuable information to aid in answering the questions
being asked. As you will discover in this section, psychodiagnostic
assessment does not refer to a specific assessment context. Rather, it
refers to particular purposes for which psychological assessment is
used. Psychologists often find themselves in a position where they
have to conduct an in-depth assessment of an individual’s cognitive,
emotional, behavioural, and personality functioning. The purpose of
such an in-depth assessment is to obtain a comprehensive picture of
an individual’s functioning, so as to determine the extent to which an
individual might, for example, be:
intellectually challenged
diagnosed as having a specific learning problem, which is impeding
his or her progress at school
diagnosed as suffering from a behavioural disorder such as Attention
Deficit Disorder or a dependency disorder (in the case of drug and
alcohol abuse)
suffering from a major psychiatric disorder (e.g. schizophrenia, bipolar
disorder, depression)
• suffering from a neuropsychological disorder (i.e. he or she might
be experiencing cognitive, emotional, and personality disturbances
as a result of impaired brain functioning).
Psychodiagnostic information can be employed, for example, to make
a psychiatric diagnosis and guide the nature of the intervention
required. Additionally, psychodiagnostic information can be used in a
psycholegal context to guide the court in reaching complex decisions
(e.g. in custody and child-abuse cases, for a motor-vehicle accident
claim). The nature of the referral question will determine the
purpose(s) that the psychodiagnostic assessment must serve and how
such an assessment process will be structured. Generally speaking,
the process followed in psychodiagnostic assessment is as follows:
Clarify the referral question.
Comprehensively evaluate an individual (or family) using multiple
assessment measures and multiple sources (collateral information).
Establish the presence or absence of certain psychological (e.g.
hallucinations, personality disturbances), neuropsychological (e.g.
reversals, neglect of one side of the body), or physical symptoms (e.g.
fatigue, loss of appetite).
Compare the symptoms identified with standard psychiatric and
neuropsychological disorders to determine if and in which category of
disorders the individual best fits.
Make a prognosis regarding the future course of the disorder and a
prediction regarding the extent to which the person will benefit from
psychotherapeutic intervention, as well as the need for other types of
intervention (e.g. psycho-pharmacological treatment, occupational
therapy).
• Prepare an oral or written report on the outcomes of the
assessment, the resultant recommendations, and, in the case of a
psycholegal (forensic) assessment, express an expert opinion on
the matter being considered by the court.
Psychodiagnostic assessment, like all other forms of psychological
assessment, is a dynamic and flexible process. There is no single
assessment measure or procedure that will be able to provide a
comprehensive picture of the intellectual, emotional, and
psychological functioning of a client. The psychologist, therefore, has
to develop a battery of assessment measures and procedures
applicable to the needs of the specific client and in response to the
specific referral question. The term battery is understood in this
context as a particularly selected group of measures and procedures
that will provide the required information to the psychologist, and by
implication therefore, different assessment batteries are selected for
different clients or patients (see also Chapter 9). We will briefly
discuss some of the psychological measures and procedures that are
often used.

15.5.2 Assessment measures and procedures


15.5.2.1 Interviews
An initial, detailed interview is usually the first step in the assessment
process. The initial interview may be seen as a semi-structured form
of assessment, as opposed to structured, standardised, and more
formal assessment measures, such as intellectual or personality
measures, for example. The initial interview is conducted either with
the client or, as in the case of under-age children, the parent or
guardian of the client, or some other involved and appropriate
caregiver. The information gained from this type of assessment is
varied and the specific goal of the interview will depend on the nature
of the information that the psychologist is attempting to discover. This
implies that the psychologist will tailor the nature of the open and
closed questions asked, in order to elicit different varieties of
information. For example, the initial interview aimed at obtaining
information about a school-age child with a suspected learning
problem will differ from the questioning used when the psychologist
attempts to obtain information from a psychiatric client to determine
the best therapy.
The initial interview, therefore, is a dynamic and flexible process
that relies heavily on the knowledge and expertise of the psychologist.
The information gained may be factual in nature (e.g. the duration of
a particular symptom), or inferred information, such as the
interpretation of particular non-verbal behaviour or even the avoidance
by the client of an area of information. The verbal information offered
by the client may be either confirmed or contradicted by the non-
verbal and behavioural information expressed either consciously or
unconsciously by a client.
Details of the client’s family and cultural background, birth and
developmental history, educational history, psychosocial development,
work history, marital status, significant illnesses (physical and mental),
and so on, are obtained during the initial interview. It is also important
to gain information about the history of the present problem, and the
factors that seemed to precipitate and perpetuate it. The initial
interview information is often cross-validated by interviewing
significant others in the person’s family or place of employment.
Apart from serving a history-taking function, the initial interview
provides the psychologist with an opportunity to gain first-hand insight
into the client’s mental and emotional state. A mental status
examination is thus often conducted as part of the initial interview. A
mental status examination usually consists of providing a client with
certain activities or stimuli and observing his or her response. For
example, to see how well a person is orientated to time, the
psychologist will ask the client what day of the week it is, what the
date is, of what month, and of what year? A mental status examination
usually covers the client’s physical appearance; sleeping and eating
habits; whether the client is orientated for time, place, and person;
psychomotor functioning (e.g. does the client’s hand shake when he
or she picks up something); an analysis of the fluency and nature of
the client’s speech; screening of the client’s higher-order intellectual
functioning (e.g. attention, memory, social judgement); and screening
of the client’s emotional status. Patrick (2000) provides a brief record
form for a mental status examination that practitioners can use as a
guide. Furthermore, the Mini-mental State (Folstein, Folstein &
McHugh, 1975) is an example of a formalised mental status exam
which is widely used as a brief cognitive screening measure in
neuropsychological assessment, especially for dementia.
Administration and scoring are standardised and cut-off scores are
provided to identify abnormal performance that could be indicative of
dementia.
The information gained from the mental status examination can
provide the psychologist with clues regarding aspects of functioning
that need to be explored further and verified by using more structured
assessment measures. In addition, the initial interview together with
the mental status examination provide the psychologist with the
opportunity to establish whether the client can be formally, and
comprehensively, assessed and whether the assessment will need to
be tailored in any way.
When interviewing significant others in the client’s life to verify and
elaborate on the information obtained in the initial interview, the
psychologist must always keep in mind the fact that the various
stakeholders in the assessment process have competing values (see
Chapter 8). When conflicting information is obtained, the reason for
this can often be traced back to the differing vested interests of the
parties involved. For example, in view of the financial compensation
that an individual may receive for the cognitive impairment or
emotional trauma suffered as a result of a head injury sustained in a
motor vehicle accident, family members might not always be totally
truthful and may exaggerate difficulties that the individual has in an
attempt to secure the compensation.
15.5.2.2 Psychological assessment measures
From the information obtained from the initial interview as well as
collateral information, the psychologist is able to hypothesise the
nature and origin of the problem that the client may be experiencing.
The psychologist will then develop an assessment battery that will
specifically assist him or her to explore and investigate all these
hypotheses. In view of the comprehensive nature of the assessment
required, all the following areas (or domains) of functioning need to be
covered in an assessment battery:
arousal, attention, and modulation (regulation) of behaviour
motor and sensory functions
general cognitive abilities
language-related abilities non-verbal abilities
and visual perception memory and the ability to
learn new information concept formation,
planning, and reasoning emotional status and
personality
• academic skills (reading, writing, and basic calculations).
Although there might be more attention paid to some of the above
aspects, depending on the picture of the client’s functioning that
emerges and the hypotheses being pursued, information needs to be
gathered on all the aspects. The full spectrum of psychological
measures covered in this book could potentially be included as part of
an assessment battery. Some measures, such as an intelligence test
battery, are quite cost-effective as the various subtests tap a number
of the important aspects that need to be assessed. In other instances,
specialised measures may be used. For example, when a person who
has sustained a head injury is being assessed, it might be useful to
use measures that have been specifically developed to tap
neuropsychological functions, or measures that have been researched
in relation to how people with head injuries or other
neuropsychological disorders perform on them.
The value of comprehensive psychodiagnostic assessment can be
considerably enhanced if the psychologist forms part of a
multidisciplinary team. Psychodiagnostic information obtained from a
psychologist can be collated and integrated with information from
other professionals, such as social workers, occupational therapists,
paediatricians, and speech therapists, to obtain a very comprehensive
picture of the person’s functioning. In addition, ‘assessment in the
clinical field is evolving to a more inclusive, individualized and
compassionate approach that brings the emic into greater focus’
(Butcher, Hass & Paulsen, 2016, p. 225). This approach includes
giving clients opportunities to contribute to setting the goals for the
assessment and providing assessment feedback during appropriate
moments in the assessment and therapeutic process. Consequently,
this approach is referred to as ‘therapeutic assessment’ (Butcher,
Hass & Paulsen, 2016, p. 225). Given that the assessment results are
interwoven with the therapeutic process, they are highly
contextualised (the emic aspect). Clients consequently gain greater
insight into them and are more likely to find them meaningful. ‘The
therapeutic assessment perspective moves psychological assessment
away from a technical activity and turns it into an individualized,
meaningful, and life-changing experience for the client’ (Butcher, Hass
& Paulsen, 2016, p. 226).

15.5.3 Psychological knowledge and expertise


A wealth of information about the functioning of the client is thus
obtained from a variety of measures and sources during the
psychodiagnostic assessment process. The task of the psychologist is
to integrate all this complex and diverse information in order to
develop an understanding of the present status and functioning of the
client. From this understanding, the psychologist is able to draw
conclusions and make judgements regarding the most appropriate
form of intervention that will best benefit and meet the needs of the
client.
The ability to ask pertinent questions in an initial interview, pick up
on non-verbal cues, plan an appropriate assessment battery, and
integrate the diverse information to form a detailed understanding of
the person’s behaviour and problem requires considerable skill,
expertise, and knowledge on the part of the psychologist. It is for this
reason that psychologists tend to choose to focus on only one type of
psychodiagnostic assessment. Some focus more on psychiatric
assessment, where knowledge of psychiatric disorders as described in
the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) or
the World Health Organisation’s latest International Statistical
Classification of Diseases and Related Health Problems (ICD-11) is of
prime importance if a sound psychiatric diagnosis is to be made.
Others focus on child custody and access assessment. In this
instance, the psychologist needs to be intimately familiar with the
various laws pertaining to custody, access, and the rights of minor
children, the research literature related to child-rearing practices and
custody outcomes, and the commonly used assessment measures.
Certain psychologists choose to focus mainly on neuropsychological
assessment. Not only is a thorough knowledge of brain organisation
and functioning required, as well as knowledge of common
neuropsychological disorders (e.g. head trauma, epilepsy,
cerebrovascular accidents, tumours), but also knowledge and
expertise related to specialised neuro-psychological assessment
measures. Then there are psychologists who focus on the
assessment of children and adults who have been sexually abused or
raped, and so on. Due to choice or certain emphases during training,
psychologists tend to develop specific expertise regarding one or
some of the applications of psychodiagnostic assessment discussed
in this section. It is due to the specialist knowledge and expertise in
psychodiagnostic assessment that they often find themselves working
and testifying in legal or forensic settings. This is the topic of the next
section.

15.5.4 Psycholegal or forensic assessment


The term psycholegal assessment is used to indicate broadly the
use of psychological assessment information for legal purposes.
Forensic assessment is more narrowly focused and concerns itself
primarily with the determination of the capacity of a person to testify,
sanity, criminal responsibility, and the extent to which the defendant is
dangerous and may repeat the same criminal behaviour.
The fact that psychologists operate within a scientific paradigm and
assess human behaviour using rigorous methods and measures
enables them to fulfil a unique function in legal settings (Louw & Allan,
1998). It is important to keep in mind, however, that when
psychologists testify in legal settings they do so as expert witnesses.
Merely being a psychologist does not make the person an expert. To
be considered an expert, the psychologist must have built up
specialist knowledge and expertise in a particular area of psychology
and must be recognised as an expert by his or her professional peers.
By virtue of this specialist knowledge and expertise, the psychologist
will be able to testify before the court and express an informed opinion
regarding the matter at hand.
When a psychologist undertakes a psycholegal assessment and
provides expert testimony in court cases, he or she must do so in an
ethically responsible way (see Chapter 7 of the Ethical Rules of
Conduct, Department of Health, 2006). This is especially important as
the psychologist should never take sides or be biased. At all times, the
psychologist needs to express an unbiased professional opinion,
irrespective of any pressure placed on him or her by lawyers,
attorneys, and so on. To adhere to good conduct practices in
psycholegal assessment and testimony, the psychologist should:
ensure that they have the necessary specialist knowledge and
competencies
ensure that psychological assessment findings are used as the basis
for generating conclusions, as this provides the court with verifiable
information
ensure that all the information in the report, as well as the
recommendations made, can be substantiated
testify truthfully, honestly, and in an unbiased manner
acknowledge the limitations of the information that was gathered and
the conclusions that were reached
• avoid performing multiple and potentially conflicting roles (e.g.
being a witness and a consultant for the defence). It is very
important that the psychologist clarifies his or her role as expert
from the outset.
It might interest you to know in which areas South African
psychologists tend to do the most psycholegal work. Louw and Allan
(1998) found that child-custody cases, together with personal-injury
cases (e.g. assault cases, assessment of the symptoms following a
head injury in a motor-vehicle accident), comprised 53 per cent of the
psycholegal activities undertaken by South African psychologists. In a
test-use survey, Foxcroft, Paterson, Le Roux and Herbst (2004) found
that 5.6 per cent of psychologists surveyed indicated that they
frequently undertake child-custody assessment, while 8.3 per cent
indicated that they frequently perform other types of psycholegal
assessments.
Custody evaluations, in particular, pose many challenges to
psychologists as the different parties who have a stake in the
assessment are in an adversarial position to each other. The ethical
practice standards required for child custody assessments entail the
following:
independent interviews and assessment sessions with both parents;
parents need to be reminded that, while they need to respond
honestly and truthfully, the outcome of the assessment will be used to
make a custody decision, which may or may not be perceived by them
as being favourable
several interviews and assessment sessions with each child alone;
children need to be informed about the purpose of the assessment
an assessment session with the parent and children together; ideally,
both parents should have an opportunity to be observed interacting
with their children
• interviews with or obtaining written information from neighbours,
general practitioners, social workers, teachers, baby-sitters,
grandparents, friends, and so on.
Concern has been expressed that South African psychologists are not
always well-equipped to function effectively in psycholegal/forensic
contexts. For example, Nicholas (2000) evaluated reports of
psychologists submitted to the Truth and Reconciliation Commission
(TRC) and concluded that the reports were of questionable quality. He
asserted that this impacted negatively on perceptions of the
profession and felt that a monitoring mechanism needed to be
introduced to enhance the quality of expert testimony.

15.5.5 Limitations of psychodiagnostic assessment


We stressed that the ability to reach an accurate psychodiagnostic
conclusion requires extensive training and experience. Even then, the
potential to formulate incorrect opinions is ever present. As we pointed
out in Chapter 1, psychological assessment culminates in the forming
of an informed opinion, which, since it is an opinion, always leaves
the possibility that the wrong opinion was reached. Psychologists
might inadvertently err on the side of selectively attending to only
certain bits of information, or selectively remembering certain aspects
of an interview, or not following up on a critical piece of information, or
employing faulty logic. All of these can lead to incorrect conclusions.
Psychologists thus need to guard against becoming overconfident and
need to keep the possibility that they might be wrong uppermost in
their minds.
While it is expected of psychologists who provide expert opinions
in psycholegal contexts to do so on the basis of an assessment, this is
sometimes not possible given the client’s level of functioning or
psychological well-being. What should a psychologist do in this
instance? According to paragraph 69 of the Ethical Rules of Conduct
(Department of Health, 2006), the psychologist needs to note the
limiting impact that not being able to perform a comprehensive
psychodiagnostic assessment has on the nature and reliability and
validity of the findings and conclusions reached. In such an instance,
the psychologist provides a ‘qualified opinion’ (Ethical Rules of
Conduct, Department of Health, 2006, par. 69).

❱❱ CRITICAL THINKING CHALLENGE 15.5

Tebogo, who is 48, has recently been retrenched. Since then, he has
struggled to sleep at night, has lost his appetite, sits at home staring into
space and sighing, and he complains of feeling very down in the dumps.
His wife becomes increasingly concerned about his lack of drive and energy and
his depressed mood. Tebogo is referred to you (as a psychologist) by his general
practitioner to find out what his problem is, as there is no physical cause for his
symptoms. What type of questions would you ask Tebogo in the initial interview?
What aspects of functioning would you want to tap when you plan an assessment
battery? If you were a registered counsellor or a psychometrist (in independent
practice), would you be able to perform this assessment? Motivate your answer.

15.6 Conclusion
Psychological assessment can be used in a range of contexts, some
of which were highlighted in this chapter. You probably noticed that in
many cases the same measures are used in different contexts (e.g.
intelligence tests are used in industry, education, and
psychodiagnostic assessment), although the purpose for which the
measure is used in different contexts might differ. Conversely, you
were also introduced to measures in this chapter that are only used in
a specific context (e.g. developmental assessment measures and
work-related aptitude measures). At the end of the chapter you were
reminded that the conclusions reached from an assessment are open
to alternative interpretations but that the task of the psychology
professional is to reach as informed a conclusion as possible.

CHECK YOUR PROGRESS 15.1


15.1 Describe some of the purposes for which individuals, groups,
and organisations are assessed in industry.
15.2 Distinguish between a structured, unstructured, and semi-
structured interview.
15.3 Explain what is meant by the term ‘developmental assessment’
and provide some examples of development measures.
15.4 Distinguish between achievement and aptitude tests used in
educational settings.
15.5 Define the concept ‘assessment battery’.
15.6 What is ‘psycholegal assessment’?

By now you should have quite a few snapshots of the way in which
assessment measures are used in different contexts. At our next stop
in the Assessment Practice Zone, you will have an opportunity to
explore issues related to the interpretation of measures, as well as
how to report on the results of an assessment.
Interpreting and reporting
assessment results
CHERYL FOXCROFT AND KATE W. GRIEVE
Chapter CHAPTER OUTCOMES
16 This chapter deals with the different ways of interpreting assessment results
and reporting or conveying these to the people concerned.
By the end of this chapter you will be able to:
› explain different methods of interpreting assessment results
› explain the ethical considerations that need to be adhered to when
conveying assessment results
› explain the basic methods that can be used to convey assessment results
› understand the purpose and nature of an assessment report.

16.1 Introduction
Assessment measures are used for two main purposes: research and
applied practice. In the research situation, tests are used in a variety
of ways, depending on the purpose of the research. When you are
faced with tests that measure a similar domain, your choice of test will
be determined by the research problem. For example, if you were
interested in assessing the home environments of malnourished
children, you might decide to use the Home Screening Questionnaire
(Coons, Frankenburg, Gay, Fandal, Lefly & Ker, 1982), because it
takes all aspects of a young child’s environment into account, rather
than another measure, such as the Family Environment Scale (Moos
& Moos, 1986), that focuses on relationships between family
members.
In practical applications, assessment measures are used to help
professionals to gain more information about a person and to facilitate
therapeutic intervention. For example, assessment measures may be
used for child and adult guidance, to evaluate scholastic achievement,
to investigate psychopathology or determine individual strengths and
weaknesses, and for selection, placement, and training of people in
industrial or commercial organisations (see Chapter 15). Assessment
measures are generally used to answer specific questions about a
person. For example, if a child is not performing well at school, you
may wonder if this is due to limited intellectual ability, emotional
problems, or environmental difficulties. Assessment measures can
help us understand people’s functioning and indicate where
intervention is required.
Whatever the purpose of the assessment, the results of an
assessment need to be interpreted so that a person’s functioning can
be meaningfully described and conclusions can be reached. This
chapter outlines methods for interpreting assessment results.
Furthermore, the chapter will introduce you to good practices for
conveying assessment results and ethical dilemmas faced when doing
so. The chapter concludes with guidelines for writing effective
assessment reports.

16.2 Interpretation
Once you have administered a measure and obtained a score, you
have to decide what that result means for the specific person who was
assessed. In order to reach the best possible decision, detailed
information is required about the person concerned. Why is this so? A
test score on its own is meaningless (you will learn more about this in
Chapter 17). A test score only becomes meaningful when it is viewed
in light of information about the particular measure used, as well as all
aspects of the particular person, and the specific purpose of the
assessment. The assessment practitioner therefore has to interpret a
test score in order for it to be meaningful.

16.2.1 The relation between interpretation and validity


In order to use assessment results in a meaningful way, there must be
some relation between the results and what is being interpreted on the
basis of those results. For example, if we interpret a verbal intelligence
quotient of 79 as meaning that a child has below-average verbal ability
(because the average is 100 and the standard deviation is 15), we
must know that the child does indeed have poor verbal skills. We can
look at another example in a different context. You have many
applicants for a job and can only appoint 40, so you decide to give
them all an aptitude measure and accept only those applicants who
obtain scores above a certain cut-off score. You need to know
beforehand that there is a relation between the cut-off score and job
success, otherwise you may appoint applicants who perform above
the cut-off score but cannot do the work (see Chapter 3).
This brings us to the question of validity (see Chapter 5).
Interpretations of test scores depend on the validity of the measure or
the information used. There are different forms of interpretation that, to
a greater or lesser extent, are related to the different types of validity
discussed in Chapter 5.
16.2.1.1 Descriptive interpretation
Descriptive interpretations try to describe the test-takers as they are
and in terms of the way they behave at the time of testing. Descriptive
interpretations are based on currently available information. For
example, a descriptive interpretation of an IQ score of 100 would be
that the test-taker is average. Descriptive interpretations do not
include attempts to interpret a score in terms of prior ability or
disadvantage or in terms of future predicted behaviour.
Descriptive interpretations are dependent on construct, content,
and concurrent validity. For example, on completing an interest
inventory, Lebo has a higher score for scientific than practical
interests. The descriptive interpretation that Lebo is better at research-
related activities than mechanical ones can only be made if sufficient
information is available about the validity of the measure. Firstly, there
has to be proof that the assessment measure does in fact measure
research and mechanical abilities (construct validity). After all,
scientific interests could include other things besides research, and
practical interests could include abilities other than mechanical ones.
Secondly, the items in the measure should be suitable for the
standardisation population. For example, the content of the items
should actually reflect scientific or practical interests (content
validity). Thirdly, the test scores should correlate with scores on other
measures of the same characteristic (concurrent validity). If these
conditions are not met, the interpretation will not be deemed valid.
16.2.1.2 Causal interpretation
Causal interpretation refers to the kind of interpretation that is made
about conditions or events in a test-taker’s background, based on
assessment results. For example, the decision may have to be made
as to whether a child has the ability to do well in an academic course
or would do better in a technical school. If the child has worked hard
and, despite a supportive environment, still struggles with academic
tasks, the interpretation could be made that there is some condition
(perhaps a learning disability) that makes academic study difficult. A
possible explanation for current behaviour is found by interpreting the
test score in light of the child’s past development. In this case, validity
can be determined by examining information on the children in the
standardisation sample to establish whether certain past conditions
(such as a neurological condition) have any bearing on current
assessment scores. This type of interpretation can also help, for
example, to determine the effect of a head injury based on
assessment results (see Chapter 15).
Causal interpretations can also be made for scores on a
personality measure. For example, in the case of the Children’s
Personality Questionnaire (see Chapter 11), the appendix to the
manual contains a summary of the background and biographic
correlates available for each factor. There is empirical evidence that a
low score on the emotional-stability factor is associated with parental
divorce or separation, or conflict among children in the family (Madge
& Du Toit, 1967).
16.2.1.3 Predictive interpretation
Look at the following example: Andrea obtains high scores for
numerical ability, three-dimensional reasoning, and mechanical insight
on an aptitude measure. The counsellor interprets these scores as
meaning that Andrea has the ability to follow a career in the field of
engineering. This is an example of predictive interpretation. The
counsellor knows that there is a relation between current aptitude in
these fields and a future criterion, in this case a successful career in
engineering. This type of interpretation rests heavily on the predictive
validity of the aptitude measure.
16.2.1.4 Evaluative interpretation
The fourth form is that of evaluative interpretation, which combines
an interpretation of a test score with a value judgement based on
available information about the test-taker. This is generally what
happens in a counselling situation. Evaluative interpretations lead to a
recommendation, which can only be justified if the validity of the other
information is known. Let’s look at the following example: A young
woman who has a specific reading disability does well on an
intelligence measure and wants to know whether she should study
law. She also has an interest in accounting and business. The
counsellor’s recommendation is for her not to study law (despite her
above-average intellectual ability), and for her to rather pursue her
interest in accounting and business. On the one hand, this
recommendation implies that a reading disability will have a negative
effect on the ability to study law and presumes that a reading disability
predicts performance in law (predictive validity). On the other hand,
the recommendation assumes that a reading disability will not affect
performance in the business world. This type of recommendation,
based on evaluative interpretations of assessment results, can only be
made when supported by validity data.
You can see that, as you shift from a descriptive approach to an
evaluative approach, you move away from objective data towards
more subjective assessment (Madge & Van der Walt, 1997). The
descriptive approach relies on an assessment result to describe the
individual, while the others rely on increasingly more subjective input,
until interpretation is combined with professional evaluation in the last
form of interpretation.

16.2.2 Methods of interpretation


In general, there are two broad approaches to the interpretation of
assessment results: mechanical and non-mechanical.
16.2.2.1 Mechanical interpretation of assessment results
The mechanical approach is the psychometric or statistical way of
looking at assessment results. Cronbach (1970) has also called it an
actuarial approach. What this means is that an assessment result is
interpreted purely as a statistic. Those who support this method of
interpreting assessment results do so on the basis that it is objective,
reliable, scientific, mathematically founded, and verifiable (Madge &
Van der Walt, 1997). In other words, it fits in with the logical positivistic
or empirical approach to understanding behaviour, which rests on the
assumption that all behaviour is observable and measurable. An
example of this method of interpretation can be found in computerised
interpretation of assessment scores (see Chapter 14). For example,
users of the MMPI-2 (see Chapter 11) may obtain a computer printout
of numerical scores on each of the subscales, as well as diagnostic
and interpretive statements about the test-taker’s personality and
emotional functioning.
Mechanical interpretation includes the use of profile analysis and
comparison of standard scores as well as techniques such as
regression and discriminant analysis. In this way, scores are used like
a recipe. That is, in the same way that ingredients are combined
according to a recipe to make a cake, certain combinations of scores
can be interpreted to mean something specific. In this way,
mechanical methods are better suited to prediction purposes than to
descriptive or evaluative interpretations.
Profile analysis

A profile is defined as a graphical representation of a client’s test


scores, which provides the test user with an overall picture of the
testee’s performance and his [or her] relative strong and weak points
(Owen & Taljaard, 1996, p. 133).

A profile can be compiled from scores using different measures (such


as several cognitive measures), or scores from subtests of the same
measure (such as the 16PF). When using scores from different
measures, it is important that:
the norm groups for the different measures are comparable
the scores are expressed in the same units (e.g. standard
scores) • errors of measurement are taken into account.
However, it is important to note that a profile that describes a
particular population may not hold for an individual. That is, not all
subjects with a specific disorder necessarily manifest a specific
cognitive profile associated with the group having that disorder and, in
addition, there may be others who manifest the pattern or profile but
do not have the disorder. In addition to test results, information on
other areas of functioning should also be considered when identifying
or describing specific disorders and conditions.
16.2.2.2 Non-mechanical interpretation of assessment results
Most assessment measures are developed to meet rigorous
psychometric criteria such as validity and reliability. In an ideal world,
assessment scores would give a perfect indication of a person’s
attributes or abilities. In the real world, scores are seldom perfect
indicators and the person administering the measure has to interpret
the scores in light of other information about the test-taker. This is
referred to as non-mechanical interpretation. In other words,
assessment scores are not regarded as statistics but meaning is
inferred from assessment results. This is what happens in clinical
practice and this approach is described as impressionistic or dynamic,
because it is more sensitive and encompasses a more holistic view of
the test-taker. The assessment practitioner uses background
information, information gained from interviews, as well as test results,
to form an image or impression of the test-taker (see Chapters 1 and
15). One of the main advantages is that test data can be used
together with information that can be obtained from facts and
inferences obtained through clinical methods (Anastasi & Urbina,
1997).
Of course, there are always dangers in non-mechanical
interpretation, the greatest danger being the possibility that the
assessment practitioner could over-interpret the available data. The
assessment practitioner has to guard against the tendency to
overemphasise background information, and under-emphasise the
test scores and individual attitudes and behaviours of the test-taker.
The non-mechanical approach is more difficult and demanding than
the mechanical approach due to the element of subjectivity. This
approach requires greater responsibility from the assessment
practitioner to understand the theoretical basis of the measure, its
statistical properties, as well as the assessment practitioner’s own
orientation and subjective input. Responsibility can be gained through
thorough academic training, keeping up to date with developments in
the field, as well as extensive training and experience with the
assessment measures.
16.2.2.3 Combining mechanical and non-mechanical approaches
Both the mechanical and non-mechanical approaches have been
criticised (Madge & Van der Walt, 1996). Supporters of the non-
mechanical approach criticise the mechanical approach for being
artificial, incomplete, oversimplified, pseudo-scientific, and even
‘blind’. On the other hand, the non-mechanical approach is criticised
for being unscientific, subjective, unreliable, and vague.
In fact, the two approaches should not be seen as mutually
exclusive. They can be integrated and used to complement each
other. Matarazzo (1990) has suggested that the most effective
procedure in our present state of knowledge is to combine clinical and
statistical approaches. The mechanical approach may be used to
obtain a score on an intelligence measure, but the non-mechanical
approach can be used to consider that score in light of what is known
about the test-taker’s particular context. For example, a score of 40 on
a motor-proficiency test would be below average for most children, but
for a child who has sustained a severe head injury and has some
residual paralysis, this score may mean that considerable progress
has been made. It is important that any measure used should have
sound statistical properties and that the assessment results should be
viewed within the context of an evaluation of the test-taker, in order for
these to be interpreted meaningfully.
Behavioural cluster analysis is a particular method of interpreting
and synthesising diverse results from a test battery to identify
congruent patterns of behaviour/competencies linked to disorders
(e.g. a mild head injury, a learning problem) or a particular job
competency profile by combining mechanical and non-mechanical
approaches. The first step in behavioural cluster analysis is to list all
the measures used that tapped a certain behaviour/ competency. For
example, to assess the verbal reasoning ability of an applicant, a
Reading Comprehension test, and the Similarities and Vocabulary
subtests of an intelligence test were used together with an interview.
The second step is to interpret each of the test scores obtained
mechanically. The third step is to combine all the test information
obtained with behavioural observations and collateral information to
arrive at an integrated descriptive narrative of the verbal reasoning
ability of the applicant, using the non-mechanical interpretation
method. For example, as the applicant obtained an average score for
the Vocabulary subtest, could express herself clearly when answering
questions during the interview, and obtained below-average scores for
Similarities and Reading Comprehension, the integrative descriptive
narrative could be as follows: While the applicant has a good
vocabulary and expresses herself clearly, she has some difficulty
thinking systematically and logically when she needs to analyse,
evaluate, and understand complex written material and verbal
arguments. Behavioural cluster analysis would be conducted for each
of the behaviours/competencies assessed to determine whether the
pattern of performance across the behaviours/competencies is
congruent with a certain disorder or job-competency profile. For
example, lawyers need to have good verbal reasoning ability. So, if a
school-leaver wishes to pursue a career in law, but has some verbal
reasoning difficulties, a career counsellor might either suggest that the
person pursue another career or develop certain verbal-reasoning
skills (e.g. identifying and developing an argument).

16.2.3 Interpretation of norm-referenced tests


In Chapters 1 and 6 you learned that there are two types of measures:
norm-referenced and criterion-referenced measures. You will no doubt
remember that in a norm-referenced measure each test-taker’s
performance is interpreted with reference to a relevant standardisation
sample (Gregory, 2010). Let us look at a practical example. On the
SSAIS-R (see Chapter 10), the mean scaled score for each subtest is
10. This means that a child who obtains a scaled score of 10 is
considered average in comparison to the performance of all the
children in the normative sample. Therefore, it is important to establish
that the particular child you are assessing matches, or is relatively
similar to, the characteristics of the normative sample. If they do not
match, you do not have a basis for comparison and your interpretation
of the test score will not be meaningful. The results of norm-
referenced measures are often reported as scores, such as
percentile ranks or standard scores (see Chapter 3), which are
calculated on the basis of the performance of the group on whom the
measure was normed.
In practice, it often happens that the test-taker does not exactly
match the normative sample. For example, in South Africa we often
use measures that were normed in other countries. This factor must
be taken into account when interpreting assessment results. In this
instance, the score cannot be considered an accurate reflection of the
test-taker’s ability but should be seen as merely an approximate
indication. Clinical interpretation then becomes particularly
important. For example, many of the neuro-psychological measures
currently used to determine the presence or severity of brain damage
were developed overseas and local alternatives do not exist.
However, due to commonalities in brain-behaviour relationships and
the cognitive processes associated with them, there is sufficient
evidence that certain patterns of scores are applicable cross-culturally
(Shuttleworth-Edwards, 2012; Shuttleworth-Jordan, 1996). These
measures can be used in a multicultural society such as South Africa,
provided that the scores are interpreted qualitatively and not in a
recipe fashion.

16.2.4 Interpretation of criterion-referenced measures


Whereas norm-referenced measures are interpreted within the
framework of a representative sample, criterion-referenced
measures compare the test-taker’s performance to the attainment of a
defined skill or content (Gregory, 2010). In other words, the focus is on
what the test-taker can do rather than on a comparison with the
performance of others (norm-referenced). An example of a criterion-
referenced measure with which we are all familiar is a school or
university examination. The test-taker is required to master specific
subject content and examinations are marked accordingly, irrespective
of how well or how badly other students perform. In this way, criterion-
referenced measures can be meaningfully interpreted without
reference to norms. What matters is whether the test-taker has met
the criterion, which reflects mastery of the content domain required (in
the case of academic examinations, the criterion is usually a mark of
50 per cent).

16.3 Principles for conveying test results


The importance of how feedback on the outcome of assessment is
conveyed is often not clearly understood by an assessment
practitioner. There are certain practical and ethical considerations to
be taken into account when conveying assessment results to test-
takers and other interested people. Not only the content but also the
way in which assessment results are conveyed can have a
considerable impact on people’s lives. Assessment should not be
seen as an isolated event. It is part of the process of evaluating and
taking action in the interests of the test-taker (see Chapter 1). In the
case of psychological assessment, the assessment process (that
includes conveying results) should be seen as a form of psychological
intervention. Decisions that are made based on assessment results
can change the course of someone’s life.

16.3.1 Ethical considerations


Certain professional ethical values guide the use and interpretation of
results as well as the way in which these results are conveyed (see
Chapter 8 and Department of Health, 2006, Chapter 5). The
discussion that follows pertains to psychological measures in
particular but, in general, should be taken into account with any kind of
measure.
Irrespective of whether psychological measures or other
procedures are used, all behavioural assessment involves the
possible invasion of privacy (Anastasi & Urbina, 1997). This principle
sometimes conflicts with psychologists’ (and other professionals’)
commitment to the goal of advancing knowledge about human
behaviour. The deciding factor in choosing to administer a
psychological measure is usually whether including it is essential to
achieve the purpose of the assessment. However, in exercising the
choice to assess a person using a psychological measure, the
psychology professional must ensure that the test-taker’s privacy is
protected.
16.3.1.1 Confidentiality
By law, psychological services are confidential. This means that a
psychologist may not discuss any aspect relating to clients (or test-
takers) without their consent. Assessment practitioners must respect
the test-taker’s right to confidentiality at all times.
The right to confidentiality raises the question: Who should have
access to assessment results? Assessment results may not be
divulged to anyone without the knowledge and consent (preferably
written) of the test-taker. However, in some instances a psychologist
may be compelled by law to provide assessment results. When
required to do so, the psychologist should make it clear that he or she
is doing this under protest. Ethical dilemmas do arise regarding the
issue of confidentiality (see Box 16.1 below) and the professional has
to take responsibility for whatever action is taken, always
remembering that serving the best interests of the test-taker is
paramount.

❱❱ ETHICAL DILEMMA CASE STUDY 16.1

While you are administering a test to Thabo (who is 10 years old), he confides in
you that he is a victim of sexual abuse and that the perpetrator is a wellrespected
member of the family. What should you do? The situation in South Africa dictates
that if you suspect abuse of any kind, you are required by law to report it. In
matters of sexual abuse, it is generally best to encourage the person
concerned to report the problem themselves. In the case of a child, this is
more difficult. If you decide to report the matter yourself, make sure to have all
the facts first, to take steps to protect the child (this requires simultaneous
multidisciplinary intervention), and to ensure that professional services are
available to the family to deal with reactions to the matter. If adequate steps
are not taken, serious complications may arise (e.g. innocent people can be
traumatised, relationships in families can be irreparably damaged, and the
psychological practitioner can be sued for breach of confidentiality).

It is not considered a breach of confidentiality when psychologists


provide information to other professionals who are consulting
members of a professional team. For example, in psycholegal
(forensic) cases, a psychologist may discuss an individual with other
members of a multidisciplinary team (see Chapter 15). The guiding
principle should be that the psychologist shares information to the
extent that it is in the best interests of the individual concerned and
with their permission.
The situation is made a little more difficult in the work context
where the interests of the organisation generally predominate over
those of the individual (see Chapter 8). The psychologist has to
ensure that the data is used for the purpose for which the person was
assessed only. For example, if a psychologist administers a measure
to determine a person’s clerical skills, the results should not be used
to give an indication of that person’s intellectual ability.
16.3.1.2 Accountability
Psychologists should remember that they are, at all times,
accountable for the way in which assessment measures are used and
test results interpreted, and finally stored. Proper training, an up-to-
date knowledge base, and sensitivity to the needs of the test-taker are
essential components of the assessment procedure. You can consult
Chapters 8 and 18 for more information in this regard.
It is important to remember that test-takers have rights as well (see
Box 16.2).
Accountability also includes taking steps for the safe and secure
storage of assessment results, and disposal of obsolete data.
Assessment data should be stored securely so that no unauthorised
person has access to it. Psychologists are required to keep records of
client data, but when assessment information is obsolete and may be
detrimental to the person concerned, the data should be destroyed.

16.3.2 Methods of conveying assessment results


Assessment results should always be conveyed within the context of
an interpersonal situation or relationship. This means that the
assessment practitioner should be prepared to be supportive of the
test-taker’s emotional reactions to the results. The assessment
practitioner also needs to show respect for the test-taker’s rights and
welfare. Assessment results should be conveyed with sensitivity and
directed at the level on which the test-taker is functioning. The
psychologist should be aware of the person’s ability to take in the
information and may need to convey results at different times, as and
when the person is ready.

BOX 16.2 A TEST-TAKER’S BILL OF RIGHTS

When someone is referred for assessment, it is easy for the assessment practitioner
to adopt a position of authority and lose sight of the individual concerned. Part of
professional accountability is to consider the rights of test-takers (Rogers, 1997).

Test-taker’s Bill of Rights

Respect and Always; not negotiable


dignity

Fairness Unbiased measures and use of test data

Informed consent Agreement to assessment with clear knowledge of what


will happen; right to refuse assessment
Explanation of Clear and understandable explanation
test results

Confidentiality Guarantee that assessment results will not be available to


others without your express permission

Professional Assessment practitioners must be well trained


competence

Labels Category descriptions should not be negative or offensive

Linguistic Language ability should not compromise assessment


minorities results

Persons with a Disability should not compromise assessment results


disability

It is often helpful to ask the test-taker about his or her own knowledge
or feelings about the aspect of behaviour that was assessed. This may
serve as a starting point for further elaboration. Discussing the test-
taker’s experience of the measures can provide information that can
confirm the validity of the results or throw light on ambiguous findings.
The test-taker should also be given the opportunity to express feelings
about the assessment procedure and to ask for further information.
Assessment results should be conveyed in a way that will best
serve the original purpose for which the test was administered. If, for
example, you collect test data with the aim of revising a school
curriculum, the test results should be interpreted and conveyed with
that purpose in mind.
As far as possible, assessment results should be conveyed in
general terms, in descriptive form rather than as numerical scores.
Test scores, such as those obtained on intelligence measures, are
often misunderstood and may lead to labelling, which can stigmatise a
person. This is particularly important when the test-taker differs from
the members of the norm group, in which case the score obtained
may not be an adequate reflection of the test-taker’s capabilities or
characteristics. In addition, the assessment practitioner should use
language that is understandable and avoid technical terminology or
jargon.
Probably the most important point is that communicating
assessment results should always occur in a context that includes
consideration of all relevant information about the individual who takes
the measure. We will discuss the importance of this point in Chapter
17. Contextual information provides a framework that makes any test
result richer and more meaningful.

16.3.3 Conveying children’s assessment results


Children are usually assessed not at their own request but because
they are referred by parents, the school, or other interested parties. It
is therefore important to involve children in understanding the reasons
for being assessed and in conveying assessment results.
Children may misinterpret the reasons for assessment. For
example, a 14-year-old girl living in a children’s home was referred for
psychological assessment to assist with guidance and future planning.
The girl was very resistant to assessment because she thought it
meant she was going to be sent to an industrial school. Explaining the
purpose of assessment and how it would serve her best interests
ensured her cooperation and enhanced her performance. Similarly,
involving her in the feedback of assessment results made the process
more meaningful and helped her feel part of the decisions being made
about her life.
When children are to be assessed, the assessment practitioner
has the obligation to explain to parents the nature of the measures to
be used, the conclusions that can be reached, and the limitations of
the assessment data.
Parents are often mystified by psychological measures and
subsequent intervention is made more meaningful by involving them in
the process. The reason for assessing the child determines the
information that is conveyed to parents. The psychologist needs to be
honest with parents, but also sensitive to the possibility of upsetting
them. The results of the assessment can be conveyed in the form of a
description of the child’s functioning, indicating what action needs to
be taken, rather than merely providing scores or labels. For example,
it may be more helpful to explain to parents that a child is distractible,
and needs more structure and routine, than saying that the child has
an Attention Deficit Disorder, a label that can lead to stigmatisation
and negative perceptions. The psychologist should emphasise the
child’s strengths and encourage the parents to do the same. Parents
sometimes have unrealistic expectations about their children, which
can be damaging in itself. The psychologist can help parents to gain a
more realistic perception of their children’s abilities without resorting to
scores and diagnoses.
The psychologist also needs to take the parents’ characteristics
and situation into account when conveying information about
assessment results. On one hand, this applies to their education or
knowledge of assessment, for example the terms that the psychologist
uses should fit in with the parents’ life experience. In addition, it also
applies to the anticipated emotional responses to the information as
they will not always receive the results with calm and rational
acceptance. Many parents come away from an information session
feeling bewildered or angry, because they have not fully understood
what has been said or the implications of the findings and may have
been too self-conscious to say so. The psychologist should be
sensitive to them and allow opportunities for expressing feelings and
asking questions that concern them.
It is not always preferable for the child who was assessed to be
present when the results are discussed. When the focus is on abilities
and aptitudes, the results can be discussed with the child and parents
together. When emotional or behavioural issues are the focus,
separate discussions can be followed up by joint sessions as required
for therapeutic reasons.
You can consult texts on counselling skills for more in-depth
treatment of this topic.
16.4Reporting assessment results
in written form
Communicating assessment results usually includes the preparation
of a written report that is often followed by discussion and consultation
with the test-taker and other interested parties. Parents, educational
institutions, and other organisations often request written reports on
assessment results. Even if a report is not requested, it is useful to
prepare one not only as a record for future use, but also because it
helps to organise and clarify thoughts about the test results. It serves
a synthesising function for the person who administers the measures.
There is no one standard way in which reports should be written. In
terms of style, reports should be written clearly and simply, avoiding
professional jargon, with the main aim of avoiding misinterpretation of
results. The following are general guidelines for effective report
writing:
provide identifying information, including the date of the assessment
focus on the purpose for which the individual was tested
provide relevant facts only; if some background information is
required, provide only the detail relevant to the assessment process
write the report with the nature of the audience in mind; if the report is
for parents it may be more personal or informal, but if it is directed at
an organisation, different information may be required
comment on the reliability and validity of the assessment, for example,
the extent that the client cooperated and was motivated to perform
optimally can be used to reflect on the accuracy or validity of the test
results obtained
list the assessment measures and other information-gathering
techniques (e.g. an interview) that were used. Some practitioners
prefer to write their report by presenting the results for each measure
separately, while others prefer to organise the report according to the
behavioural domains/competencies assessed and provide an
integrative descriptive narrative per behavioural domain/competency
concentrate on the test-taker’s strengths and weaknesses that
constitute differentiating characteristics; the measure of an effective
report is whether it is unique to the individual or could apply equally
well to someone else
use general, understandable terms to describe behaviour
focus on interpretations and conclusions; test scores are not included
in reports but may be divulged on special request and with the
consent of the test-taker
where recommendations are made, it must be evident to the reader
why or how these flow from the assessment results
uphold ethical standards and values
remember that the quality of technical presentation says a great deal
about you as a professional
as the report writer is ultimately accountable for what is contained in
the report, ask yourself whether you can substantiate everything that
you have written
• remember to authenticate your report, that is, sign it and include
your credentials (e.g. ‘registered clinical psychologist’). If the
assessment practitioner is not yet registered for independent
practice (e.g. an intern, a psychometrist working under
supervision), the report must be developed and co-signed by a
registered psychologist.
The Protection of Personal Information Act (No. 4 of 2013) and the
Promotion of Access to Information Act (No. 2 of 2000) address ‘the
right to privacy, the right to information and the unlawful sharing of
information’ (Muleya, Fourie & Schlebusch, 2017, p. 2). Individuals
have a right to any information about themselves. This means that
reports cannot be withheld as confidential. However, apart from this
consideration, the main aim of the report should be to provide
meaningful information that will be useful to test-takers for the purpose
for which they were assessed.

16.5 Conclusion
In this chapter, you have learned that there are different ways of
interpreting and conveying assessment results, according to the
purpose of assessment, the individual being assessed, and the
context in which that individual functions. Interpretation of assessment
results relies on the validity of the particular measures used and
methods of interpretation vary according to the type of measures
applied. In the face of all this variation, it is the human element that is
the most important. The trained professional makes assessment
results meaningful by interpreting them in an ethically acceptable and
personally accountable way, based on training and expert knowledge.
In this way, numbers (test scores) can be transformed into valuable
information that serves the best interests of the individual.

ADVANCED READING

Following good practices in analysing, interpreting, and conveying assessment


results are of such critical importance that the ITC Guidelines for Quality Control
in Scoring, Test Analysis, and Reporting of Test Scores (International Test
Commission, 2012) have been developed to assist assessment practitioners and
can be retrieved from https://www.intestcom.org/page/19

❱❱ CRITICAL THINKING CHALLENGE 16.1

Page back to Ethical Dilemma Case Study 15.1 in Chapter 15. A case study was
presented of a Faith, a 5-year-old, who was assessed at the suggestion of a
psychologist and with the consent of her caregiver, Jenny. The assessment
findings on the five subscales of the Griffiths III were provided. Write a report for
Faith’s parents on the assessment results using the principles outlined above. Do
not use psychological jargon; focus on Faith’s strengths, and make practical
suggestions regarding aspects of Faith’s development that need to be stimulated.

❱❱ ETHICAL DILEMMA CASE STUDY 16.2

A registered counsellor (RC) assesses an adult (Lucas) who is considering a mid-life


career change. On completion of the assessment, the RC scores the tests and
interprets the findings. The RC has a feedback session with Lucas and advises
him that he is best suited to the position that he currently holds. Lucas asks for a
written report. The RC develops the report and emails it to Lucas using his work
email address. Lucas saves a copy of the report on his computer at work. A few
days later he is called in by his supervisor and is told that he is being fired as the
fact that he consulted a RC about exploring other career options indicated that
Lucas was not committed to his present job. When Lucas asked the supervisor
where she obtained this information, she indicated that she came across the RC’s
report while working on Lucas’ computer, and opened and read it.

Questions
(a) Was it appropriate for the RC to email the report to Lucas?
Motivate your answer.
(b) If a psychology professional forwards a copy of a client’s report electronically,
what does he or she need to do to ensure that confidentiality is not breached?

CHECK YOUR PROGRESS 16.1


16.1 Briefly describe the role of test validity in descriptive and
evaluative interpretation of test scores.
16.2 Distinguish between descriptive and predictive interpretation of test results.
16.3 What are the three criteria for profile analysis based on scores
from different measures?
16.4 Discuss the main advantages and disadvantages of non-
mechanical interpretation of assessment results.
16.5 Discuss the use of a behavioural cluster approach to
interpreting and integrating assessment results.
16.6 In the context of guidelines for report writing, provide a critical
discussion of the statement that test scores should not be
included in the report on the results of the assessment.
16.7 John’s parents have requested psychological assessment for him because he
is not doing well at school and is starting to display acting-out behaviour. His
parents are very disappointed because they have made sacrifices for him and
expect great things of him. The assessment results suggest that John is of
below-average intellectual ability, he is not coping at school, and feels unsure
of himself. As a result, he engages in behaviour that helps him feel in control.
What guidelines should you use when you convey the test results?

By now, you are well aware of the complexities of assessment


practice. Our next stop will further highlight the many factors that
impact on the assessment process.
Factors affecting assessment results
CHERYL FOXCROFT AND KATE W. GRIEVE
Chapter CHAPTER OUTCOMES
17 By the end of this chapter you will be able to:
› understand how the individual’s context impacts on assessment results
› understand how variations in administering a measure impacts on
assessment performance
› understand the factors to consider when interpreting variations in test
scores
› explain how characteristics of the assessment practitioner can impact on
assessment performance
› explain how characteristics of the test-taker impact on assessment
performance
› understand the impact of cheating on test performance and how to detect
and combat it
› understand how bias impacts on the validity of the assessment results.

17.1 Introduction
Psychological assessment is based on the notion that people have
inherent abilities and characteristics that can be measured. These
broad traits or abilities are partly inherited but are not static, because
the way they are manifest can vary according to different contexts and
conditions. It is because we assume that traits permeate behaviour in
many contexts that we undertake psychological measurement and
make assumptions about behaviour. However, it is important to
remember that a test score is just a number. It is not equivalent to the
ability that is being measured. When we measure two kilograms of
potatoes we know exactly what we are measuring; with psychological
measures, however, we cannot measure an attribute or ability directly
and we therefore have to infer it from a person’s test performance. In
so doing, we give the person a score (or number) on the basis of the
response given. A score is therefore an inferred or indirect measure of
the strength of that attribute or ability. You learned about
measurement error in Chapter 3, and will therefore know that there
are factors in addition to a person’s inherent ability that can influence
scores and the meaning of those scores. In this chapter, we will look
at some important factors that should be considered when interpreting
performance on psychological measures.
Imagine that you have selected a suitable measure, administered it
carefully, and scored the protocol. You have probably also worked out
a scaled score with reference to a normative group (see Chapter 3).
But what does that score actually mean? You may be puzzled by this
question, so let us look at some examples.
Our first case study concerns John, who is 34 years old. He is the
managing director of an international company and a consultant for
several other business concerns. He has a pilot’s license and plays
chess on a national level. He was head boy of his school, obtained
distinctions in all his subjects, and was voted the boy most likely to
succeed by his class. John was referred for assessment because he
has recently had difficulties coping with his business commitments.
Our second case study involves Thandi, who is 20 years old and a
prospective university student. Her father is a manual labourer, she
comes from a disadvantaged environment, and the quality of her high
school education was poor. Thandi requested help in deciding
whether she would be capable of doing a science degree.
John and Thandi are assessed with a standardised intelligence
scale and both obtain an IQ of 105. Does this score mean the same
for both individuals? We will return to discuss these examples later,
but they serve as an introduction to this chapter in which we discuss
some of the important factors that have to be taken into account when
interpreting a test score.

17.2Viewing assessment results


in context
As we pointed out right from Chapter 1, a test score is only one
piece of information about how a person performs or behaves.
Therefore, if we look at an individual in terms of a test score only, we
will have a very limited understanding of that person. This is why
Claassen (1997) wrote: ‘Never can a test score be interpreted without
taking note of and understanding the context in which the score was
obtained’ (p. 306).
In addition to the test score, the information in which we are
interested can be obtained by examining the context in which a
person lives. When you think about it, you will realise that people
actually function in several different contexts concurrently. At the
lowest level, there is the biological context, referring to physical
bodily structures and functions, which are the substrata for human
behaviour and experiences (see Section 17.2.1). Then there is the
intrapsychic context, which comprises abilities, emotions, and
personal dispositions (see Section 17.2.2). Biological and intrapsychic
processes are regarded as interdependent components of the
individual as a psychobiological entity. In addition, because people do
not live in a vacuum, we need to consider a third and very important
context, which is the social context (see Section 17.2.3). The social
context refers to aspects of the environments in which we live, such as
our homes and communities, people with whom we interact, work
experiences, as well as cultural and socio-political considerations.
In addition to looking at the effects of the different contexts within
which people function, we also need to examine methodological
considerations such as test administration, which may also influence
test performance and therefore have a bearing on the interpretation of
a test score (see Section 17.3).

17.2.1 The biological context


17.2.1.1 Age-related changes
One of the most obvious factors that affects test performance is
chronological age. This is why measures are developed for certain
age groups based on the skills and interests characteristic of that
particular age group. A good illustration of this point can be found in
infant and preschool measures (see Chapters 10 and 15), which differ
in content according to the age range they cover. Traditionally,
measures for infants (i.e. young children from birth to approximately 2
years of age) include items that largely measure sensory and motor
development, and in this way differ from measures for older children.
Measures for older children (from about 2-and-a-half to 6 years) focus
more on the child’s verbal and conceptual abilities, because
development at this stage is highly verbal and symbolic. If you wonder
why this is so, you can look at well-known theories of development,
such as that of Piaget. You will note that early infant development
proceeds through the medium of sensory and motor skills, such as
hearing and producing sounds, manipulating objects or gaining
increasing muscular and postural control. In fact, Piaget (Piaget &
Inhelder, 1971) describes cognitive development during the first two
years of the infant’s life as the sensorimotor period. After this stage,
children use their verbal and cognitive skills to start reasoning about
the world around them. The content of tests for preschool children
reflects their changing abilities. However, there is not necessarily
continuity in development from infancy to childhood, and the meaning
of an infant’s test score rests on certain assumptions that we have
about the nature of early development (see Box 17.1).
As children grow older, their intelligence increases with respect to
their ability to perform intellectual tasks of increasing difficulty and to
perform them faster (this is referred to as their mental age). For this
reason, standardised measures of intellectual ability have norms for
different age groups. The ratio between mental age and chronological
age is fairly constant up to a certain age and, therefore, IQ scores also
remain fairly constant. Mental age starts levelling off after the age of
16 and it is generally found that performance on most intelligence
tests shows no further noticeable improvement. Adults’ scores may
vary a little as a result of life experience and subsequent accumulated
knowledge, but ability level remains mostly the same. Scores on
intelligence measures therefore stabilise during early adulthood and
then start declining after the age of approximately 50 years, as older
persons react more slowly and are less able to cope with new
situations. It is essential to have knowledge about the normal pattern
of cognitive changes in the elderly so that abnormal changes, such as
those associated with dementia, can be detected and the elderly and
their families can receive the necessary assistance.

BOX 17.1 PREDICTABILITY OF INFANT MEASURES OF MENTAL ABILITY


Infant intelligence as assessed by traditional measures is a very poor predictor of
later intelligence. Studies have indicated that there is a low correlation between
traditional psychometric test scores in the first year of life and later IQ. Between
18 months and 4 years of age, the ability of mental tests to predict later IQ
increases gradually and, by age 5, the correlations can be as high as .8 to .9
(Gibbs, 1990). There are several possible explanations for the poor predictability
of infant measures. Firstly, the nature of the developmental process may not be
as continuous as we think. Children may develop at different rates, at different
times. Secondly, infant tests are strongly influenced by fine-motor coordination
and sociability, whereas the contents of intelligence measures for slightly older
children are more closely related to abilities required for successful school
performance. These abilities are typically acquired by approximately 18 months
of age in some rudimentary form, the time at which infant measures start to
increase in predictive ability. In addition, early development is ‘plastic’ or
malleable, and environmental events, over which infants have little control, may
alter their genetically determined developmental course. Socio-economic status
and the quality of the home environment have a strong influence on children’s
intellectual functioning (Richter & Grieve, 1991).
In cases where infants are neurologically impaired and function considerably
below average, there is a greater degree of continuity (and therefore greater
predictability) between functioning in infancy and childhood, due to the biological
constraints on development. Although the severity of the disability can also be
influenced by the quality of the child’s caring environment, the range of outcomes
is restricted, compared to that of a non-impaired child.

The degree of predictability also seems to depend on the type of test


that is used. For example, the new infant intelligence measures that are
based on information processing techniques, such as recognition,
memory and attentiveness to the environment, have much greater
predictive validity than the more traditional measures (Gibbs, 1990).

It is important to note that an IQ score in no way fully expresses a


person’s intelligence (see also Chapter 10). The score is an indication
of performance on a sample of tasks used to assess aspects of
intellectual ability in a particular assessment situation. There are many
non-intellectual (non-cognitive) factors that influence performance on
a measure of intellectual ability – this is the focus of this chapter.
17.2.1.2 Physical impairments
There is a wide variety of physical conditions that can have an effect
on test performance. Consider the following example: While you are
testing Mrs Garies, who is 68 years old, you find that you have to keep
repeating questions and providing explanations, and you wonder if her
understanding is poor. When you calculate her score, you note that it
is low for her age, and, based on her difficulties in understanding, you
conclude that she is mentally impaired. Would you be correct? If you
had discovered in the course of your interview that Mrs Garies has a
hearing problem (but is too proud to admit it), you would realise that
her low test score is a function of her hearing impairment and that she
may not have limited mental ability. As you saw in Chapter 9, by using
specific measures designed for people with disabilities or by adapting
measures and administration procedures, a better indication can be
obtained of their true level of functioning.
As you saw in Chapter 15, an important part of the assessment
process involves taking a thorough medical history. Previous trauma,
such as a stroke or head injury (see Box 17.2), can have a permanent
effect on intellectual ability. In general, poor scores are obtained on
intelligence measures where there is radiological evidence of
neurological impairment.

BOX 17.2 THE EFFECTS OF TRAUMATIC BRAIN INJURY

Being a developing country, South Africa has a higher rate of traumatic brain injury
than the developed Western countries (Naidoo, 2013). There are different types of
brain injury and the consequences of the injuries vary as a result of individual
differences in physiological make-up, personal temperament and ability factors, as
well as social circumstances. When there is a focal injury (a blow to one particular
part of the brain), the impairment is usually limited to those functions served by the
specific area of the brain that is injured. When injury is widespread, there may be
diffuse axonal injury that results in microscopic damage throughout the brain.
This type of injury is frequently sustained in car accidents due to abrupt
acceleration-deceleration and rotational forces, and is often more debilitating
than focal injuries, because a larger area of the brain is involved.
People who have sustained diffuse brain injury, particularly if it is a severe
injury, usually have residual cognitive difficulties in the areas of attention and
concentration, verbal functioning and memory, as well as motor slowing (Lezak,
1995). There are often behavioural difficulties, such as increased irritability and
aggression, decreased motivation, and emotional disorders, such as depression.
There is no one pattern of symptoms and the effects of brain injury vary with age,
the location and severity of the injury, and the personal and social circumstances of
the person concerned. However, the effects of brain injury are generally seen in
poor performance on intelligence tests and neuropsychological measures designed
to assess different functions and abilities. An intelligence measure alone is often not
sufficient to identify brain impairment, because a previously intelligent person may
be able to rely on previously well-learned and established patterns of behaviour and
abilities and still be able to perform adequately while, at the same time, being unable
to cope with new situations and the demands of daily living.

Did you know that mild traumatic brain injury can occur when sports
players suffer numerous concussions? This is why there are concussion-
management programmes in place and sports players are sent for concussion
tests, which often include the ImPACT computerised neurocognitive test
(Shuttleworth-Edwards, Whitefield-Alexander, Radloff, Taylor & Lovell, 2009).

Deficits in cognitive functioning often also accompany serious


illnesses, such as cerebral malaria (Holding et al., 2004), meningitis,
chronic fatigue syndrome (Busichio, Tiersky, Deluca & Natelson,
2004), or HIV/Aids infection (Joska, Westgarth-Taylor, Hoare,
Thomas, Paul, Myer & Stein, 2011; Reger, Welsh, Razani, Martin &
Boone, 2002).
Speech impairments present another type of problem. If the
assessment practitioner cannot understand the verbal responses of a
speech-impaired client, lower scores may be awarded than would be
the case if the responses were understood. There is an example of a
young man with cerebral palsy who was sent to an institution for the
severely mentally retarded because his speech was almost
incomprehensible and he had very little motor control. The young man
befriended another patient who slowly began to understand the
sounds he was making and typed out what he was saying on
computer. It was subsequently discovered that the young man had a
great deal of ability and he was helped to write a book on his
experiences. Even if they can speak well, patients with cerebral palsy
or other motor impairments may be penalised by timed (speed) tests,
and a proper indication of their ability can only be obtained by using
alternative measures or adapting standard measures.
In addition to permanent disabilities, there are also transient
physical conditions that can depress test scores. An example of such
a condition is chronic pain, which has a negative effect on the
deployment of existing abilities. Even things that sound trivial, such as
disturbed sleep, can interfere with a person’s functioning and lead to
lowered test scores (Aloia, Arnedt, Davis, Riggs & Byrd, 2004). In
addition, it is important to note whether the person is taking any
medication. For example, excessive intake of painkillers can affect
performance on psychological measures, and some drugs have side
effects like attention difficulties or motor slowing. In some cases,
medication may improve intellectual functioning, for example
antiretroviral therapy improves cognitive functioning in people with
HIV/Aids (Richardson et al., 2002). Chronic alcohol abuse can also
have a negative effect on cognitive functioning and test performance
(Ferrett, Carey, Thomas, Tapert & Fein, 2010; Lezak et al., 2004).

17.2.2 The intrapsychic context


When we consider the intrapsychic context, we look at people’s
experiences and feelings about themselves. It is difficult to separate
the biological and intrapsychic factors because one’s experiences,
interpretations, feelings, and personality depend on biologically based
processes, such as perception, cognition, emotion, and motivation, as
well as the genetic and experiential contributions to these processes.
17.2.2.1 Transient conditions
Transient conditions refer to everyday events that unexpectedly crop
up and upset us sufficiently so that we are ‘not ourselves’ and cannot
perform as well as we normally do. Imagine that you are on your way
to write an examination. A car jumps a red traffic light and collides with
yours. Fortunately, there is relatively minor damage, you are unhurt,
and you go on your way, concentrating on getting to the exam venue
in time. However, when you get your exam paper, you have difficulty
understanding the questions, your concentration wanders, you find it
hard to express yourself properly, and after an hour you give up and
go home. What has happened? Stress and anxiety in any form can
interfere with normal functioning, such as the ability to think clearly, to
concentrate, and to act on plans and intentions. This is particularly
apparent where psychological assessment is concerned. The person
may be anxious because he or she does not know what psychological
measures are, or may be afraid that the measures might reveal or
confirm some weakness or something unpleasant. The person then
cannot think clearly and this results in poor performance. A similar
situation can arise when a person is upset about bereavement in the
family, or worried about a sick child. On a more severe level, post-
traumatic stress experienced by people who have been involved in
violent and often horrifying events can significantly impair functioning
(Beckham, Crawford & Feldman, 1998; Shabalala & Jasson, 2011;
Van Rooyen & Nqweni, 2012).
17.2.2.2 Psychopathology
When psychopathological conditions are present, these usually impair
the person’s assessment performance (Moritz, Ferahli & Naber,
2004). For example, cognitive functioning is negatively affected by
disorders like anorexia and bulimia nervosa (Tchanturia et al., 2004),
anxiety (Van Wijk, 2012), or depression (Popp, Ellison & Reeves,
1994). Depression is frequently manifested in problems with memory
and psychomotor slowing, as well as difficulty with effortful cognitive
tasks (Bostwick, 1994). These conditions may then mask the person’s
true level of functioning on psychological measures.

17.2.3 The social context


We now come to what is probably the most difficult, yet most
important, set of factors that have to be considered when interpreting
assessment performance. At the beginning of this chapter, we said
that a test score has no meaning unless it is viewed in context. Let us
look at some of these aspects relating to the social contexts in which
people live.
17.2.3.1 Schooling
Most intelligence measures are an indirect measure of what the
individual has learned during their lifespan and, as such, are strongly
influenced by schooling experiences (Nell, 2000). This is because
formal education provides us with the problem-solving strategies,
cognitive skills, and knowledge that we need to acquire information
and deal with new problems, which is what is required by traditional
intelligence measures.
It follows that there is a strong relation between scores on
intelligence measures and scholastic and academic achievement
(Holding et al., 2004). One would expect that an adult who has no
more than a Grade 5-level education might obtain a fairly low score on
an intelligence measure, but it would be surprising if a person of the
same age who passed Grade 12 obtained the same low score.
Schooling experiences influence how people think or the reasoning
strategies they use, how they approach problems, their ability to deal
with issues in an independent way, as well as their ability to work
accurately and quickly (see the discussion on test-wiseness in
Section 17.2.3.5).
However, the situation in South Africa is complicated by the fact
that attainment of a particular grade is not necessarily the best
indicator of what a person is able to achieve. There is evidence that it
is not only the grade that counts but the quality of schooling as well.
Poorer-quality schooling makes it difficult for students to compete on
an equal footing with those from more advantaged educational
backgrounds. For example, Grade 12 learners with poorer quality
schooling in the previously disadvantaged areas of South Africa may
not have the same knowledge base or skills as a learner with a Grade
12 from a school in a privileged area (Shuttleworth-Edwards, 2012;
Shuttleworth-Edwards et al., 2004). Similarly, the number of years of
formal education a person has does not mean the same for everyone.
Six years of quality schooling is obviously not the same as six years of
poor schooling. The students with poorer-quality schooling are less
likely to have acquired the same level of cognitive skills (which are
necessary for successful test performance) as the advantaged
students.
What does this mean in practice? If a pupil from a large, poorly run
school in Soweto and a pupil from a well-run school in one of the more
affluent areas in Johannesburg both achieve the same score on a
traditional measure of intelligence, this does not necessarily mean that
they both have the same level of intellectual ability. The student from
Soweto may have a great deal of potential ability, but, because the
schooling experience did not equip the student with the kinds of
information and skills needed for good performance on the measure,
the score obtained may be an underestimation of that pupil’s potential
ability. For this reason, newer intelligence measures are designed to
measure potential ability, rather than relying heavily on knowledge and
skills obtained from formal schooling (see Box 17.3) and also Chapter
10.

BOX 17.3 THE ASSESSMENT OF LEARNING POTENTIAL

Taylor (1994) has developed an assessment approach that assesses learning


potential, evaluating the person on future developmental capacity rather than on
prior knowledge and skills. Learning potential measures appear to measure the
construct of fluid intelligence. Traditional measures tend to incorporate questions
that require the individual to draw on knowledge or skills acquired in the formal
educational system, or on other enriching experiences (such as those derived
from reading extensively). The main purpose of learning potential measures is to
incorporate learning exercises in the material, and to assess how effectively the
individual copes with and masters strange tasks. New competencies are
developed in the test room itself, where all individuals have the same opportunity
to acquire the new knowledge and skill. Learning potential measures evaluate
individual differences in terms of learning variables rather than knowledge
variables. Such learning requires the individual to cope with, and attempt to
master, new cognitive challenges. This approach to assessment appears to
offer a great deal of promise in terms of avoiding the influence of prior
knowledge and skills or the lack thereof. However, the claimed differences
between learning potential measures and existing measures have yet to be
unequivocally demonstrated, and the validity of learning potential measures
proven. In terms of the latter, Murphy and Maree (2006) reviewed the use of
measures of learning potential at higher-education institutions in South Africa.
They reported both positive and negative findings with respect to learning
potential measures being able to predict academic success.
In addition, they concluded that when learning potential
measures were used in combination with conventional measures,
the assessment results provided information of a higher quality that
either of the two types of measures on their own.

17.2.3.2 Language
Language is generally regarded as the most important single
moderator of performance on assessment measures (Nell, 2000). This
is because performance on assessment measures could be the
product of language difficulties and not ability factors if a measure is
administered in a language other than the test-taker’s home language.
At some time in our lives, we have probably all had to read or study
something written in a language different to our own and, even though
we may have a working knowledge of that language, we experience
the feeling that it takes longer to process the material in another
language. We can think and discuss so much better in our own
language. By the same token, we would probably therefore perform
better in a test that is written and administered in our home language
than in a test in a second language. In this way, language becomes a
potential source of bias (see Chapters 6, 7 and 9). When a test is
written in a different language, it may present a range of concepts that
are not accessible in our home language.
You may think that the problem can be solved by translating the
measure. Although this is the solution in some instances, this is not
always the case because there are other difficulties (see Chapter 7).
Some languages do not have the concepts and expressions required
by measures and an equivalent form of the measure cannot be
translated. Measures of verbal functioning pose a particular problem.
For example, you cannot merely translate a vocabulary test into
another language because the words may not be of the same level of
difficulty in the two languages.
There is also the problem of what is the most suitable language to
use (see Chapter 9). Let us consider an example. Itumeleng speaks
Northern SeSotho at home, together with a little bit of English and
some isiZulu he has learned from his friends. From Grade 4 his
lessons at school have been presented in English and he is currently
studying through the medium of English at university. He has acquired
most of his knowledge and experience in English and he has a
working knowledge of concepts in English. Would it be better to
assess Itumeleng’s level of ability in English or in his home language?
There is no simple answer to this question. Many students prefer to be
assessed in English but do not have the advantages of people whose
home language is English; they are therefore not comparable. It is
unfortunate that this situation often results in a double disadvantage in
that their competence in both languages is compromised. For
example, students may have some knowledge of English but it may
not be very good and, at the same time, having to study in English can
detract from further development of their home language. This would
have an influence on assessment performance and may mask the
actual ability that the measure is designed to assess. In addition, De
Sousa, Greenop, and Fry (2010) point out that exposure to
monolingual and bilingual-literacy learning contexts can impact on
cognitive processing which, in turn, could result in differing
performance on some cognitive tasks.
An additional problem arises from the fact that a combination of
languages, referred to as a ‘township patois’, is commonly used in the
residential areas surrounding cities and a pure version of one
language is seldom spoken. A child who grows up speaking the patois
would be at a disadvantage if assessed with a formally translated
measure. The solution may lie in bilingual or multilingual assessment
and the possible use of drawings where a test-taker can capture
visually what they are unable to express verbally (Moletsane & Eloff,
2006). The moderating impact of language on assessment
performance remains a challenge for the field of psychometrics in
South Africa (see Chapters 6, 7, and 9).
Furthermore, where an assessment measure is only available in
English but the measure will be applied to test-takers for whom
English is their second or third language, an option is to make use of
language/text simplification, which includes the use of simpler
English words and breaking down the syntax of complex sentences so
that the language used in the measure does not introduce bias or
construct irrelevant variance (Aluisio, Specia, Gasperin & Scarton,
2010). This approach is commonly used in international assessment
programmes such as the Trends in International Mathematics and
Science Study (TIMSS) and the Programme for International Student
Assessment (PISA). In South Africa, the English version of the
Learning Potential Computerised Adaptive Test (LPCAT) employs
language simplification. The test-taker can ‘hover over’ the English
item with their cursor and simplified wording appears.
It is not only the language background of the test-taker that adds to
the complexity of psychological assessment interpretation in South
Africa. Very often, the psychology professional is not proficient in the
home language of the test-taker, which makes communication difficult.
It is thus not surprising that in a survey of intern psychologists, almost
76 per cent indicated language difficulties with clients as being
problematic (Pillay & Johnston, 2011). Language differences not only
provide communication challenges but also impact on test
performance. For example, Kamara (1971) reports that when children
from Sierra Leona were tested by an assessment practitioner who
spoke their home language their performance was better than when
this was not the case.
17.2.3.3 Culture
Our culture has a pervasive influence on the way we learn, think about
things, and behave. In fact, culture cannot be considered as a
separate factor, exerting an influence apart from other factors in the
person’s environment. It is an integral part of that environment.
Although there may be cultural differences in test scores, there is no
decisive evidence that culture influences competence rather than
performance. In other words, although socio-cultural factors may
affect manifestations of underlying cognitive and emotional processes,
it is unlikely that cultural differences exist in the basic component
cognitive processes (Miller, 1987). For example, empirical studies
have shown that the pattern and sequence of acquisition according to
the Piagetian stages of cognitive development is universal, although
the rate of acquisition may vary among different cultures
(Mwamwenda, 1995). This is why it is important to take cultural
differences into account when interpreting test performance.
The content of any measure will reflect the culture of the people
who designed the measure and the country in which it is to be used
(see Chapters 1, 6, and 7). Clearly, people who do not share the
culture of the test developers will be at a disadvantage when taking
that particular measure. For example, a child’s performance on a
memory task is likely to vary according to the way in which the task is
constructed – the more familiar the nature of the task, the better the
performance (Mistry & Rogoff, 1994). Similarly, Malda, Van de Vijver,
and Temane (2010) developed cognitive tests in which item content
was drawn from both the isiXhosa and Setswana cultures equally. The
two versions of the tests were administered to samples of Afrikaans-
and Setswana-speaking children. They found that the children
performed better on the tests developed for their cultural/language
group, especially when the tests tapped more complex functions.
Despite several attempts, there is no such thing as a culture-free
measure but assessment practitioners can be sensitive to culture
fairness in assessment. For example, items should be examined for
cultural bias (see Chapters 6 and 7). When a measure is administered
to a person who does not share the culture of the people who
developed it and the country in which it was standardised, the
assessment practitioner should always consider that the test score
obtained may not be a good reflection of the person’s true level of
functioning due to the influence of cultural factors. An additional
consideration is that measures and their resultant scores cannot be
assumed to have equivalent meaning for different cultures and
countries (see Chapter 7). For example, a Spanish version of the
Wechsler Adult Intelligence Scales has been developed in Spain, but
there is doubt about its suitability for Spanish-speaking people living in
the US. When considering adopting a measure developed elsewhere,
validation studies have to be undertaken first (see Chapter 7).
According to the belief that behaviour is shaped by prescriptions of
the culture, its values and guiding ideals, it is accepted that cultural
experience influences the meaning of events for an individual and,
therefore, responses will differ among cultures. This is particularly
important in a projective measure, such as the Thematic Apperception
Test, that rests on the assumption that the stories an individual tells
based on the pictorial stimulus material will reveal something of the
person’s personality disposition. For example, a card showing a
picture of a man working on a farm may elicit different responses
according to the culture of the person being tested. One person may
say that the man is poor because he is working the land himself and
has no labourers to help him, while another may say the man is rich
because he has his own land that he can farm. A different response
should not be seen as a deficient (poorer or abnormal) one but should
be viewed as a function of the person’s culture.
In addition to the number of different cultures in South Africa, we
also have the problem of variations in acculturation. Acculturation
refers to the process by which people become assimilated into a
culture. This process occurs at different speeds and is not the same
for all facets of behaviour. We see, for example, that many people
who move from rural areas into cities generally let go of values
associated with a more traditional way of life and adopt a more
Western way of living. But not everything changes at the same pace.
There might be quite a quick change in the way the person dresses or
in habits such as watching television, but the person might still adhere
to more traditional beliefs such as consulting a faith healer rather than
a psychologist when there are problems. There are wide interpersonal
differences in the degree and pace of acculturation. We cannot, for
example, presume that someone who has lived in a city for five years
has adopted Western values. Differences in acculturation have
important implications for performance on assessment measures due
to the impact of the familiarity with the culture on which the measure is
based (see for example Kennepohl, Shore, Nabors & Hanks, 2004).
The closer the person’s personal values and practices are to the
culture, the more appropriate the measure. In turn, if the measure is
appropriate, the person’s test performance is more likely to be an
accurate indication of his or her actual level of functioning. The effects
of acculturation can be seen in the closing of the gap in intelligence
test scores of black and white South African students who have
benefitted from good quality education (Shuttleworth-Edwards et al.,
2004).
In some cases, it becomes very difficult to know which measure is
most appropriate to use. Let us consider the following example.
Nomsa is 12 years old and her home language is isiXhosa. She
attends a school in a middle-class suburb, speaks English at school,
and has friends from different cultural groups. Should she be
assessed with the Individual Scale for isiXhosa-speaking Pupils
because of her home language or with the SSAIS-R (see Chapter 10)
because of her situation? There is no easy answer to this question.
Each person’s situation has to be considered individually, according to
his or her familiarity with the predominant language and culture.
The most important point to remember is that without measures
with culturally relevant content and appropriate norms, fair testing
practices may be compromised. The use of potentially culturally
biased measures may have enormous implications for making
important decisions in people’s lives. For example, people may be
denied educational and occupational opportunities due to a test score
that is not a valid indicator of their ability. However, to add points to a
score to compensate for cultural differences (so-called ‘affirmative
assessment’) does not provide the answer. For instance, if a
psychologist used a particular test in an attempt to determine a
specific deficit, adding points may mask the presence of a deficit and
this would not be responsible practice. The potential contribution of
cultural influences to assessment performance should always be kept
in mind.
17.2.3.4 Environmental factors
Performance on measures of ability is influenced by variation of
environmental factors. We need to consider distal factors, or those
relating to the wider environment, such as socio-economic status and
the degree of exposure to an enriching social environment, as well as
proximal factors, or those relating to the individual’s immediate
environment, such as socialisation experiences in the home. Why is
this important? Environmental factors determine the types of learning
experiences and opportunities to which we are exposed and this, in
turn, affects our level of ability and the extent to which we are able to
use that ability. This, of course, plays a role in test performance.
The home environment
There are certain child-rearing practices that have been shown to
promote the development of competence and cognitive abilities that
are tapped by traditional measures of development. The kinds of child-
rearing practices shown to promote children’s development are
parental responsivity (including fathers’ involvement) and the provision
of home stimulation in the form of the mother’s active efforts to
facilitate cognitive and language skills, allowing the child to explore the
possibilities of the environment, encouraging interaction with the
environment, and supporting the development of new skills (Grieve &
Richter, 1990). However, there are also indications that universal
assumptions about development do not apply equally to the process
of development in all cultures. Certain culturally sanctioned child-
rearing practices may appear to be contradictory to mainstream
Western practices but should not necessarily be seen as deviant or
detrimental (Garcia Coll & Magnuson, 2000). Therefore, in assessing
children, it is important to understand children and their families in
relation to both the immediate and larger socio-cultural environment
and the ways in which cultural factors operate in the child’s
environment (September, Rich & Roman, 2016). Individual differences
within cultures should also be taken into account. For example,
despite living in disadvantaged communities in South Africa, there are
mothers who are able to structure their children’s environments in
order to promote their children’s mental development, irrespective of
cultural practices and impoverished environments (Richter & Grieve,
1991).
There are certain characteristics of household structure that can
also influence children’s abilities. Overcrowding, for example, has
generally been found to have an adverse affect on cognitive
development as well as on the educational performance of young
children. A study of South African children from previously
disadvantaged communities has suggested that overcrowding
negatively influences children’s cognitive milieus by, for example,
limiting and interrupting exploratory behaviour, decreasing the number
of intimate child-caretaker exchanges, and increasing noise levels
(Richter, 1989). Overcrowding does not only have a negative effect on
children’s development but has been shown to have a detrimental
effect on adults as well. However, overcrowding does not necessarily
have a negative impact; it all depends on whether the person
interprets the situation as stressful. People who have grown up in
crowded homes and are used to crowded places generally find them
less unpleasant than people who are used to a greater degree of
privacy. Overcrowded homes may even offer an advantage to young
children, as a result of the possibility of multiple caretakers to provide
affection and see to their needs, particularly in cases where their
mothers work away from home.
Socio-economic status
Socio-economic status (SES) refers to the broader indices of a
person or family’s social standing. The major indicators of SES are
education, occupation, and income. You may wonder why this would
affect performance on psychological measures. A person’s SES is
important because it determines the type of facilities that are available
(such as schools, libraries, clinics, and other social services), the
opportunities that present themselves, and the attitudes generally
exercised by others in the same socio-economic group. The socio-
economic aspects of the environment thus have a significant influence
on the experiences that moderate abilities, attitudes, and behaviour. It
has been found, for example, that mothers with relatively high levels of
education are more likely to provide stimulating home environments
than mothers who are less educated. The educated mothers are more
likely to read magazines and newspapers (and therefore are more
informed about issues), play actively with their children, help them
learn rhymes and songs, and take them on outings (Richter & Grieve,
1991). These types of activities help to promote children’s
development and well-being. Similarly, talking about current issues
and having books at home help to promote a culture of learning.
Some of the socio-economic factors that are associated with
educational disadvantage are poverty (see Box 17.4), poor health,
malnutrition, inadequate school facilities, poor social adjustment,
absence of cultural stimulation, lack of books and the other aspects of
modern civilisation that contribute to the development of Western
ways of thinking, which is what is measured in traditional
psychological measures. People from disadvantaged environments
often do not perform well on psychological measures because the
tasks are unfamiliar and may be regarded as unimportant.
Achievement in many cognitive measures is a function of the ability to
develop strategies to solve problems in the test situation, as well as of
being motivated to do well.

BOX 17.4 GROWING UP IN POVERTY

Despite the fact that poverty rates are decreasing In South Africa, there are still a
large number of children living In poverty (Barnes, Hall, Sambub, Wright & Zembe-
Mkabile, 2017). For a child, being poor means being at risk for developing both
physical and psychological problems. What are these risks? In comparison with
children from higher socio-economic backgrounds, children from impoverished
backgrounds experience, among other things, a greater number of perinatal
complications, higher incidence of disabilities, greater likelihood of being in foster
care or suffering child abuse, greater risk for conduct disorders, and arrest for
infringements of the law (Barnes, Hall, Sambub, Wright & Zembe-Mkabile, 2017).
One of the major reasons for developmental risk is believed to be associated
with the fact that poverty increases the risk of inattentive or erratic parental care,
abuse, or neglect because a significant proportion of parents living in poverty
lack the personal and social resources to meet their chilsdren’s needs (Halpern,
2000). Many young adults living in impoverished inner-city communities have
personal experiences of parental rejection or inadequate nurturance, disrupted
family environments, parental substance abuse, family violence, and difficulties
at school. These adverse conditions have a negative effect on their own abilities
as parents as well as their ability to form adult relationships, the capacity to
complete schooling, to obtain and keep jobs, and to manage their households. In
addition, overstressed parents can adversely influence their children in an
indirect way by communicating their despondency and despair. This does not
mean that all parents in impoverished circumstances are unable to fulfil parenting
roles successfully. However, it suggests that consideration should be given to the
fact that the stress of poverty can play a major role in directly undermining the
efforts of even the most resourceful parents and can therefore detract from the
quality of childcare.

Urbanisation
It is generally found that urban children show superiority over rural
children in terms of cognitive performance (Mwamwenda, 1995). The
reason for this may be that the enriched and invigorating urban
environment stimulates those aspects of cognition usually assessed
by formal psychological measures. It is also more likely that provision
is made for formal schooling in urban rather than rural environments
and access to education at an early age is considered to be the single
most powerful factor facilitating cognitive development. Urbanisation is
also associated with higher parental levels of education and we
pointed out earlier that better-educated mothers are more likely to
provide the kind of home environments that are believed to be
beneficial for children’s cognitive development.
The effects of urbanisation on the changes in people’s ability as
measured by psychological measures can be seen in South African
studies that enable the comparison of test performances of children
over a period of 30 years. When the New South African Group Test
(NSAGT) was standardised in 1954, it was found that English-
speaking pupils were better than Afrikaans-speaking pupils at all age
and education levels by an average of 10 IQ points. When the
successor of the NSAGT, the General Scholastic Aptitude Test
(GSAT), was administered in 1984, a difference of five IQ points in
favour of English-speaking children aged 12, 13, and 14 years was
reported. The reduction of the difference between the groups is
ascribed to the fact that, during the period between 1954 and 1984,
Afrikaans-speaking whites went through a period of rapid urbanisation
during which their level of education and income increased
significantly. The gradual convergence of the scores corresponded
with the more general process of cultural and economic convergence
between the populations, facilitated by the process of urbanisation.
Urbanisation must, however, be considered as a global variable
that cannot offer a complete explanation for the psychological
mechanisms which account for differences in ability.
17.2.3.5 Test-wiseness
All the factors that we have discussed here contribute to a state of
test-wiseness or test-taking skills (Nell, 2000) (see Chapters 8 and
9). Such skills include assuming that you need to work as fast and as
accurately as you can, and that items get progressively more difficult
(Van Ommen, 2005). Many illiterate adults, those with little schooling,
or those who live in isolated communities are not test-wise. This
means that they do not have the skills for, and understanding of, the
requirements for successful test performance. Often they do not
understand the motivation for doing the assessment measure, and are
unfamiliar with the ethic of working quickly and accurately.
Clearly, socio-cultural experiences can affect the quality of
responses to a psychological measure. Assessment techniques
developed for Western children do not always have the same
applicability to non-Western children, depending on the ability that one
is assessing. Culture can determine the availability or priority of
access to the repertoire of responses that the person has, irrespective
of the responses that the examiner requires. Not only language
differences, but also demands posed by test requirements create
problems, because they relate to social skills, the development and
expression of which are encouraged in Western society. Experiences
within particular socio-cultural contexts can significantly influence a
person’s understanding of the meaning of a task, and may place
greater emphasis on competencies and cognitive processes that differ
from those expected in the assessment situation. It must be noted,
however, that, while culture plays a contributory role, it cannot be
isolated from the social and personal contexts in which people live.
As the use of technology in testing is becoming the norm (see
Chapter 14), an additional aspect has been introduced into
considerations of test-wiseness.

Test-taking skills now include being comfortable with, and


having the skills needed for, taking a test on a computer.

Conflicting results have been found in the literature regarding the


influence of past computer experience on performance in
computerised tests (Powers & O’Neill, 1993). Lee (1986) found, for
example, that previous computer experience significantly affects
performance on computerised tests, while Wise, Barnes, Harvey, and
Plake (1989), and Gallagher, Bridgeman, and Cahalan (2000) found
no direct relationship between computer familiarity and computerised
test performance.
There is a wealth of literature that suggests that if test-takers are
thoroughly prepared for the testing and are familiarised with the
computer beforehand, a lack of computer literacy has less impact on
test performance (Lee, 1986; Powers & O’Neill, 1993; Taylor,
Jamieson, Eignor & Kirsch, 1998). However, within the South African
context, where exposure to technology remains uneven and is greater
for people from advantaged backgrounds than disadvantaged
backgrounds, familiarising test-takers with the computer and the
requirements of the test may not necessarily remove all the potential
adverse impacts of a lack of computer literacy or technological
sophistication on test performance. Indeed, there are some indications
that the level of computer familiarity and technological sophistication
impacts on test performance, with the performance of technologically
less sophisticated South African test-takers being the most adversely
affected (Fisher & Broom, 2006; Foxcroft, Watson & Seymour, 2004).
Nonetheless, given the more widespread use of computers and
information technology in the workplace, the increased access to
computers in schools, and the widespread use of sophisticated
cellular and smart-phones and tablets, exposure to technology is
increasing in South African society and the adverse impact on
computer-based test performance is diminishing (Foxcroft, 2004b;
Foxcroft, Watson & Seymour, 2004). International findings support this
conclusion as it has been reported that as computer technology
becomes more accessible in a society, the impact of computer
familiarity (or lack thereof) decreases (e.g. Leeson, 2006).
As indicated in Chapter 14, when using computer-based or
Internet-delivered testing it is the responsibility of the assessment
practitioner to determine whether the test-taker’s level of computer
and digital literacy and sophistication is sufficient for the mode of
testing not to have an adverse impact on test performance. One way
to achieve this is to obtain information about how often a test-taker
has used a computer, what type of cellular phone they have, the type
of technologies they have at home (e.g. television, DVD player), and
the extent to which they are able to operate these various
technologies. Another way is to use an established survey
questionnaire of digital literacy. (For example, you can access a self-
assessment digital literacy questionnaire for first-year students via the
University of Cape Town’s First-Year Experience Project:
http://www.ched.uct.ac.za/first-year-experience-project.) If the
outcome of this exploration is that the test-taker does not have a
sufficient level of computer familiarity, an alternative mode of
assessment needs to be considered. Sometimes there is an
equivalent paper-based version of the test. But, increasingly, less
paper-based tests are being developed. Also, given that the nature of
what can be measured using technology cannot be reproduced in a
paper-based test version, finding alternatives is becoming a problem.
It might be that developing full practice tests, as opposed to a few
practice examples, that are administered using a dynamic assessment
approach could be an alternative. For example, the Scholastic
Achievements Tests (SATs) and the National Benchmark Tests
(NBTs), used in the United States and South Africa respectively for
university and college admission, have full practice tests and
applicants who want to take them can sign up for coaching courses.
However, the focus is largely on becoming familiar with the content
the test will cover and not computer familiarity. It would be better if
these practice tests could be taken using a dynamic assessment
approach, so test-takers could receive training in both the content
covered in the test as well as in the mode of testing and the skills
required to take a computer-based test. This could potentially reduce
the adverse impact that insufficient computer familiarity could have on
test performance.
Case studies
Before we consider methodological aspects of assessment, this
seems to be a good point to return to our case studies. Go back to the
beginning of the chapter and read the paragraph describing John and
Thandi again. Initially, you may have thought that an IQ score of 105
is average and therefore did not consider it further. But you probably
had some interesting thoughts about these case studies as you
worked through the sections on the different contexts that need to be
taken into account when interpreting test performance.
Let us look at John first. His IQ score was 105, which is a very
average score. Given John’s scholastic and occupational history, one
would expect him to obtain a much higher IQ score. After all, he had
good-quality schooling and has remarkable scholastic and
occupational achievements, he does not appear to be socio-
economically disadvantaged, and he seems to be well motivated to
perform in many different spheres of life. The complaint that John has
recently not coped well at work, together with the lower than expected
IQ score, should lead you to look for other factors that may influence
his test performance. You discover that John sustained a concussive
head injury six months ago. He was hospitalised for a few days but,
because he did not sustain any orthopaedic injuries, he thought he
was fine and was back at work within a few weeks. You decide to refer
John for further neuropsychological assessment, which confirms
functioning symptomatic of a closed-head injury and John is referred
for treatment.
Thandi’s situation is somewhat different. Once again, an IQ score
of 105 is average and you may think that Thandi may not be able to
cope with the demands of a science degree. Let us think again.
Thandi and John were assessed with the same measure. Look at the
difference between Thandi and John’s backgrounds. In comparison
with Thandi’s situation, John’s background is clearly far more
conducive to the development of the skills necessary for doing well on
psychological measures in terms of schooling, SES, and socio-cultural
factors. In addition, the measure may not have been developed or
normed for people of Thandi’s background. Importantly, the measure
was probably administered in a language other than Thandi’s home
language. It is, therefore, quite possible that this test score is an
underestimation of Thandi’s true level of ability. It would be preferable
to work together with the student counsellor to suggest a subject
choice that will allow Thandi to demonstrate her capabilities before a
final study direction is chosen. If you do not consider the possible
effects of Thandi’s background on her test performance, you may
prevent her from finding out whether or not she can realise her
dreams.
These case studies should serve as an illustration that a test score
is not always what it seems to be. Each individual has to be viewed
within the parameters of their own situation and background, and all
possible contributory factors need to be considered.

❱❱ ETHICAL DILEMMA CASE STUDY 17.1

Ashwin, a psychologist, was asked to assess 40 applicants for a position as a call-


centre operator. All the measures that he planned to use were only available in
English. However, only 15 of the applicants were mother-tongue English speakers
while the home language of the others was either isiXhosa or isiZulu. The
applicants also varied in that some had completed Grade 12 while others
had completed only Grade 10. Given that they had all at least had 10 years
of schooling In which they took English as a subject, Ashwin decided that it
was acceptable for him to administer the measures in English to all the test-
takers. When the test results were analysed, Ashwin found that the English-
speaking applicants scored higher than the isiXhosa- and IsiZulu-speaking
applicants. Consequently, only three of the top-performing English-
speaking applicants were shortlisted for the position.

Questions:
(a) Was it appropriate for Ashwin to administer the measures in
English? Motivate your answer.
(b) Could the measures have been biased? Were the test results
used fairly? Motivate your answer.
(c) How could Ashwin have approached this dilemma differently to
avoid ethical transgressions?

17.3 Methodological considerations


The process of psychological assessment is dynamic and can be
influenced by many factors. Most assessment practitioners try to
ensure that the test results provide the most accurate reflection of the
abilities or characteristics being measured but, in this process, there
are many factors that can influence the outcome of the measure. This
is why we need to pay attention to the important role of test
administration, the influence of the assessment practitioner, the status
of the person taking the measure, standardised procedures, test bias,
and construct validity.

17.3.1 Test administration and standardised procedures


In Chapter 9, we discussed the issue of the importance of adhering to
standardised instructions regarding the administration of psychological
measures in detail and so we will not repeat it here. However, each
assessment situation is unique and assessment practitioners may, for
a variety of reasons, have to adjust the standardised procedures
slightly. Flexibility and minor adjustments to test procedures are often
desirable or even necessary. However, changes should not be
random but should be done deliberately and for a reason, according to
the needs of the person being tested.
The need to adjust standardised procedures is often required when
assessing children, people with physical and mental disabilities, and
brain-injured individuals, for example (see Chapters 9 and 15). Any
changes in standardised procedures should be noted in written form
so that they can be taken into consideration when interpreting test
performance.

17.3.2 Interpreting patterns in test scores


At the beginning of this chapter, we pointed out that a test score is just
a number. It is, therefore, important that one should not place too
much emphasis on a specific score that a test-taker obtains in a single
assessment session (see Chapter 1). You know that there are many
potential sources of error and these have to be explored before you
can decide whether the test score really represents the person’s level
of ability. A person’s profile of scores should be interpreted only after
investigating all available personal information, including
biographical and clinical history, evaluation by other professionals, as
well as test results (see Chapter 1).
The most important reason why a score, such as an IQ, cannot be
interpreted as constituting an exact quantification of an attribute of an
individual is that we have to take measurement error into account. The
standard error of measurement (SEM) indicates the band of error
around each obtained score, and examiners should be aware of the
SEM for each subtest before interpreting the test-taker’s score (see
Chapter 3). The possibility of errors in measurement also has to be
taken into account when examining the differences between subtest
scores on a composite scale, such as an intelligence scale. The
observed differences between scores could be due to chance and
may not be a reflection of real differences. We need to examine this
situation further.
Some psychologists place a great deal of importance on the
differences in an individual’s scores on the various subtests of an
intelligence scale, for example, or between verbal and performance
intelligence quotients. You must remember that there can be a wide
variation in subtest performance in normal individuals and that this
variation, or scatter, does not necessarily indicate pathology. Scatter
among subtests on an intelligence scale is, in itself, not an indicator of
brain dysfunction, but rather a characteristic of the cognitive
functioning of the individual and can indicate strengths and
weaknesses. For example, a child who obtains much lower scores on
non-verbal subtests, in comparison to verbal subtest scores, may
have a perceptual problem.
The best way to examine and interpret differences in subtest
scores is to examine the incidence of subtest differences in the
standardisation sample for a particular measure. In the US, for
example, one study (Matarazzo, 1990) revealed an average subtest
scatter (difference between highest and lowest subtest scaled scores)
of six points on the Wechsler Adult Intelligence Scale. In this situation,
a difference of six points is average but the difference can range from
1 to 16 points in normal individuals. However, there are times when a
difference of a few scaled points between two subtest scores can be
clinically meaningful and should be explored further. For example, if
an English teacher obtains scaled scores of 12 for an Arithmetic
subtest and 9 for Vocabulary, this would be suspicious as one would
expect an equally high score on Vocabulary due to the teacher’s
language ability. Again, it is important to integrate test scores with
personal and social history, as well as relevant medical and clinical
findings, before a meaningful interpretation can be made.
The same applies to differences between verbal and performance
intelligence quotients. Previously, it was thought that a marked
discrepancy between verbal and performance quotients was an
indication of pathology. However, this is not necessarily the case,
particularly when it is the only indicator of possible impairment. This
was confirmed in a South African study (Hay & Pieters, 1994) that
investigated large differences between verbal and performance
intelligence quotients using the SSAIS-R in a psychiatric population.
There was a mean difference of 23 points between verbal and non-
verbal intelligence quotients, but there was no relation between the
size of the discrepancy and the presence or absence of brain damage.
This implies that a pattern of scores on its own is not definitive. The
scores have to be considered together with results from other
measures and information from other sources. Again, the important
point is to interpret scores in context.

17.3.3 The influence of the assessment practitioner


The most important requirement for good assessment procedure is
that the assessment practitioner should be well prepared (see Chapter
9). Assessment is a psychological intervention, not a mechanical
process. For this reason, it is important for the assessment practitioner
to establish rapport with the test-taker (see Chapter 9). The
establishment of rapport is particularly important when assessing
children (see Chapter 9). It is difficult to establish rapport when the
assessment practitioner’s energy is being taken up with finding the
right test or page, reading the instructions, and trying to work out what
to do next. The assessment practitioner needs to be familiar with the
procedure and at ease in the situation. Failure to establish rapport can
have a negative impact on test performance in that the test-taker’s
ability may be underestimated or distorted.
Through experience, assessment practitioners learn that they have
to respond differently to different people, according to their needs and
problems, and react accordingly. The assessment practitioner also
has to be aware of the possible effect of his or her own expectations
on the test-taker’s responses. This is a special instance of the self-
fulfilling prophecy (Terre’Blanche & Durrheim, 1999). Through subtle
postural and facial cues, for example, assessment practitioners can
convey their expectations and thereby limit or encourage the test-
taker’s responses. The assessment practitioner may also fail to
assess a test-taker properly due to certain expectations. For example,
if the examiner believes the test-taker is low-functioning, the examiner
may not explore the limits of the test-taker’s ability. By expecting a low
level of performance, the examiner may fail to determine the test-
taker’s potential.

17.3.4 Status of the test-taker


17.3.4.1 Anxiety, test-taking effort, and motivation
Do you remember how you felt when you last took a test? The
majority of people will have to take a psychological measure at some
stage in their lives, either at school or at work. Most people who have
to take psychological measures experience a certain amount of
anxiety at the idea of being tested. Although a certain amount of
anxiety is beneficial because it increases arousal, a large amount of
anxiety has a detrimental effect on test performance. Anxiety can be
reduced by the assessment practitioner’s own manner of interacting
with the test-taker, establishing rapport, and seeing to a well-
organised, smoothly running assessment procedure (see Chapter 9).
The validity of assessment results and the conclusions made from
them is closely linked to the effort that test-takers put into performing
to the best of their ability. The effort that test-takers put into giving of
their best when completing a psychological measure has become ‘an
increasingly important topic in testing literature’ (Barry, et al., 2010, p.
342). Barry et al, (2010, p. 342) argue that test-taking effort will be
greater in contexts where the assessment has ‘direct personal
consequences’ and is thus ‘high-stakes’ in nature, and that it will be
lower in ‘low-stakes’ assessment where the outcome of the
assessment does not have personal consequences.
If the person is not motivated to take a measure, his or her
performance will be lower than the level of actual functioning. It is
important that test-takers should know why they are taking a measure
and what the benefit will be. The person may be suspicious regarding
the purpose of the measure and may lack motivation to perform well.
17.3.4.2 Faking bad, malingering, and faking good
A person may deliberately perform badly on measures if it is in his or
her interests to do so. This is referred to as malingering or faking
bad (see Chapter 12). It has been suggested that individuals who
seek monetary compensation for injuries or those who are on trial for
criminal offenses may malinger or consciously do badly on measures.
Although it is not possible to detect malingering in every case, there
are techniques that can help to identify malingering (see, for example,
Ross, Krukowski, Putnam & Adams, 2003).

BOX 17.5 MALINGERING

Malingering, also referred to as sick-role enactment, may occur when people


respond, deliberately or unconsciously, in ways that they think will indicate the
presence of an impairment, such as a brain injury or memory impairment.
Tests for malingering have been put forward but these have not been
unequivocally successful (Lezak, 2004). It is difficult to detect malingering
from one measure, but the experienced psychologist can usually identify a
problem by considering all sources of information, such as comparing different
assessment results and examining test performance in relation to interview
data. The psychologist can be guided by two general considerations. One is
whether the individual has some motivation to fake test performance and the
second is whether the overall pattern of assessment results is suspicious in
the light of other information that is known about the person (Gregory, 2010).
For example, consider the situation of a person who is claiming compensation
after a motor-vehicle accident. He complains about a memory problem and
does poorly on a memory test, yet in an interview demonstrates considerably
intact memory. This would warrant further investigation.

Another form of faking is seen in the tendency for people to respond to


test items in a socially desirable way. For example, most people would
respond true to the statement ‘I would help a blind person who asks
for assistance to cross the street’ because it supports a socially
desirable attribute. The socially desirable response may not be
accurate because many people do not really enjoy helping others. To
counter this problem, the construction of tests may include relatively
subtle or socially neutral items to reduce the possibility that test-takers
will fake or give socially desirable responses. Another approach is to
construct special scales for the purpose of measuring faking or
socially desirable responses, which can be embedded in an inventory
or administered in a battery of measures (see Chapter 12).
There are ways of controlling for faking in the construction of
psychological measures, but the assessment practitioner has to be
aware of the possibility that responses will not always be a true
reflection of the test-taker’s attitudes or characteristics.
17.3.4.3 Cheating
A major challenge facing the validity of assessment results in large-
scale testing used for pre-selection or selection purposes in industry
and for admissions purposes in higher education is to identify the
number of test-takers that cheat (which is an extreme form of faking)
and to introduce mechanisms to reduce cheating (or test fraud).
Caveon test security, which is the leader internationally in data
forensics and test-security services, have found that in the fields of
educational and skills testing, 5 to 20 per cent of test-takers are likely
to cheat in some form or another (www.caveon.com). What can be
done about this? Assessment practitioners and organisations could
conduct regular security audits to ensure that there are no security
breaches that could result in test materials being ‘leaked’ to test-
takers. For computer-based and Internet-delivered tests, data forensic
audits could be undertaken to analyse the extent of aberrant or
unusual patterns of scores which could flag possible cheating.
Part of obtaining informed consent is to not only spell out the
purpose of the assessment, the confidentiality of the results, and what
feedback will be provided, but to also address the matter of what the
consequences will be of cheating or test fraud. To underline the latter,
test-takers can be requested to sign an honesty contract (Burke,
2006), which could include:
the expectation that the test-taker will respond honestly and without
help from anyone else
the expectation that the test-taker will respect the honesty of the
assessment and not share the content of the measures administered
with anyone
• that the candidate accepts that verification of the assessment
results will occur and that the test-taker may be required to take a
verification assessment.
Lastly, some organisations use various methods (e.g. polygraphs,
reference checks, interviews, and integrity tests) to assess how
honestly applicants have responded, especially in pre-employment
screening assessment, ‘as most employers want to minimise
dishonest and counterproductive behavior among their employees’
(Coyne and Bartram (2002, p. 15). Coyne and Bartram (2002) assert
that the most controversial of these methods is the use of integrity
tests to predict dishonest behaviour. While integrity tests have been
found to be valid, reliable, and fair in general, the main concern with
such tests is whether it is possible for an employer to judge an
applicant’s honesty on the basis of a self-report measure.
17.3.4.4 Practice effects
Given the widespread use of tests for selection purposes and the
competency-based approach that most organisations use, some
applicants are likely to be asked to take the same test each time they
apply for a job. A similar situation arises when learners apply to study
at higher-education institutions as most admissions tests cover the
same competencies or constructs. What effect does taking a test a
number of times have on the performance of the test-taker? That is,
are there ‘practice effects’? Reeve and Lam (2007) assert that
practice effects can be sizeable but that there has been insufficient
research to explore the reasons for the gains made due to retaking the
same test. For example, they argue that test scores could increase
when retaking the test due to cognitive affective and motivational
factors. Reeve and Lam further conclude that in ‘the context of
predicting individual differences in score gains across repeated
administrations, it is likely that those with higher self-efficacy (perhaps
because of successful past performances) are more likely to gain
more from practice than those without such efficacious beliefs’ (p.
229). As practice (through having taken the test previously) can
impact on test performance, practitioners should be aware of this and
organisations should rethink retesting policies.
17.3.5 Bias and construct validity
In earlier chapters you were introduced to the notion of validity
(Chapter 5) and test bias (Chapters 6 and 7). It is important to
remember that a measure is only valid for the particular purpose for
which it was designed. When a measure is used in a different way, the
validity must be determined for the context in which it is to be used.
For example, several short forms of the Wechsler Adult Intelligence
Scale (WAIS) have been developed and these have been found to
correlate highly with the full battery. However, one short form of the
WAIS may be useful in assessing patients with Parkinson’s disease
but may overestimate IQ scores in patients with brain tumours. This is
because different combinations of subtests are often used for specific
populations. When interpreting assessment results, it is important to
establish the construct validity of the measure for your particular
purpose, otherwise your results may be invalid.
An additional consideration in interpreting assessment results is
that of the age of the measure. Both test content and norms need to
be updated regularly, due to technological developments and naturally
occurring social changes (see Chapter 6). The use of outdated norms
and outdated test content can lead to inaccurate and invalid
assessment results. Furthermore, the items should not function in
different ways for different cultures (see Chapters 6 and 7). If there is
evidence of test and item bias for a particular measure for a particular
group of people, the assessment results will not provide an accurate
reflection of their functioning.

17.4 Conclusion
What have we learned from all this? In this chapter, you have been
introduced to several ideas about test scores and assessment results.
Perhaps the most important is that a test score on its own is limited in
what it can tell us about a person’s behaviour. The test score
represents a person’s performance. The person’s ability or trait is
inferred from the score. In order for scores to be meaningful, we need
to look at the scores in context and in terms of the psychometric
properties of the measure. Many problems and tricky issues have
been raised, particularly with regard to applying psychological
measures in a heterogeneous society such as South Africa. Given
these difficulties, you may be wondering whether psychological
assessment is desirable, or even whether it is worth using
psychological measures at all. The answer, a qualified ‘Yes’, lies in the
ability of the person administering and interpreting the measures to do
so competently and responsibly. Psychological assessment is a
necessary condition for equity and efficient management of personal
development. By not using measures, you could be denying many
individuals access to resources or opportunities. By not using
measures responsibly, you could be denying individuals their rights.

❱❱ CRITICAL THINKING CHALLENGE 17.1

Write a short article for a local newspaper in which you motivate the continued
use of psychological measures against the backdrop of the numerous factors
that affect the accuracy and reliability of the assessment results.

CHECK YOUR PROGRESS 17.1


17.1 Why do standardised measures of intellectual ability have
norms for different age groups?
17.2 Provide examples of transient conditions that can affect test performance.
17.3 Case study: Becky is a 9-year-old girl whose home language is
Afrikaans. Her father worked as a labourer on the railways but has
been unemployed for 10 years. Her mother is a housewife but has a
history of psychiatric illness. Becky is having difficulties at school and
has been referred for assessment because her teacher wants more
information about her intellectual ability. What steps should you take
to ensure that you obtain the best possible picture of her ability?
17.4 Critically discuss the statement that the number of years of
schooling completed relates to performance on intelligence tests.
17.5 What evidence do we have that acculturation affects test performance?
17.6 What is the effect of using a test with outdated norms?
On the road to culturally
meaningful psychological
assessment in South Africa
CHERYL FOXCROFT
Chapter CHAPTER OUTCOMES
18 By the end of this chapter you will be able to:
› describe the progress being made on the road towards psychological
assessment becoming more culturally-meaningful
› describe key aspects to consider when developing and adapting culturally
appropriate measures
› outline key aspects of the roadmap to guide and sustain the progress made.

18.1 Introduction
For now, we have arrived at the final stop of the journey through
psychological assessment territory. In this chapter we will reflect on
the extent to which progress is being made on the road towards
psychological assessment becoming more culturally meaningful.
Thereafter, a roadmap to guide the next steps of this journey will be
outlined. Lastly, concluding remarks will be made regarding the
resilience and determination needed to continue this daunting journey
of integrating African-centred and Western-oriented approaches to
ensure that psychological assessment is applied in culturally
meaningful and ethical ways in South Africa.

Culturally meaningful psychological


18.2
assessment: The journey so far
In Chapters 1 and 2, it was indicated that:
there is a long history of Western-oriented assessment measures
being transported from elsewhere and then being uncritically adopted
for use in South Africa
owing to our colonial and apartheid past, even when indigenous
measures were developed they were based on Western-oriented
theories and assessment approaches, and were largely standardised
and normed for white South Africans
given the dearth of available assessment measures, the practice of
inappropriately using measures developed for one group to assess
test-takers from another group arose. This led to misuse of
assessment measures and potentially inaccurate conclusions being
reached from test results
in the process, the value, relevance and meaningfulness of
psychological assessment for all South Africans was sharply
questioned and abolishing psychological assessment was a strong
consideration
• questioning the relevance of psychological assessment in South
Africa, together with the dawn of the democratic South Africa,
provided the impetus to accelerate the process of transforming
psychological assessment to be contextually appropriate.
As you page through this book, you will realise that transforming
psychological assessment in South Africa, to make it more culturally
meaningful and relevant for all South Africans, is a slow process but
that there is evidence that the pace is increasing. Do you agree? Let
us look at some of the aspects where progress is being made and
some challenges that have arisen and have often required innovative
responses.

18.2.1 Conceptualising how to combine African-centred


and Western-oriented approaches
To transform psychological assessment in South Africa, we needed a
shared vision on the approach(es) to follow. That culturally sensitive,
meaningful and relevant measures were needed was clear. But how to
arrive at such measures was the subject of robust dialogue, and many
conference papers and publications. It took us the best part of two
decades before reaching a growing consensus about the conceptual
approach that would be most appropriate.
There were initially two, seemingly opposing schools of thought
about the approach needed to develop culturally appropriate
psychological assessment measures. Over time, a third school of
thought has emerged. These three schools of thought are summarised
in Table 18.1.
As pointed out in Chapter 1, prominent African scholars have
advocated for the ‘both and’ approach (e.g. Moletsane, 2016; Mpofu,
Ntinda & and Oakland, 2012; Zuilkowski et al., 2016). That is, using
both indigenously developed, African-centred measures and Western-
oriented measures that have been appropriately adapted for the South
African context is the preferred approach. Instead of a ‘one size fits all’
approach, finding an appropriate approach to assess a construct is
what is important. Some measures that tap universal constructs can
be appropriately adapted for use in the multicultural South African
context. (Page back to Chapters 12 to 15 to remind yourself of some
of these measures.) For other measures, a Western theory is
transportable and could serve as a basis to develop an indigenous
measure (e.g. consult Chapter 10 to review the development of the
LPCAT that was based on dynamic assessment principles and
learning potential theory). While for yet other measures, a theory
needs to be generated ‘from the ground up’ in the SA context before a
measure based on it is developed (e.g. consult Chapter 12 to review
the development of the SAPI).
Generally, slow but steady progress is being made. There is an
increase in the number of indigenous and adapted measures
available, and research into measures used here, along with the
number of books that capture the evolving journey, has expanded
significantly. In the process, the work being done in South Africa is
attracting international attention. An example of this is that 17 South
African sources are cited by authors from other countries in the recent
ITC International Handbook of Testing and Assessment. Furthermore,
the way in which the South African Personality Inventory (SAPI) has
been developed is being widely acknowledged internationally as a
good example of how to develop a measure for a multicultural context
using a combined emic-etic approach.
Table 18.1 African-centred and adapted, Western-oriented tests: ‘Either or’, or ‘both and ’?

CHARACTERISTIC INDIGENOUS, ADAPTED, BOTH AFRICAN-


AFRICAN- WESTERN- CENTRED AND
CENTRED TESTS ORIENTED TESTS ADAPTED
(EMIC) (ETIC) WESTERN-
ORIENTED TEST
AVAILABLE

Rationale Given the African-centred Instead of


importance of psychology is choosing
culture in the part of the between only
multicultural universal employing an
post-apartheid discipline of emic or etic
SA context, the psychology. By approach to
only route is to adapting psychological
develop and use Western- measures, the
indigenous oriented position that
measures for measures, the has emerged is
psychological content and to develop and
assessment to language of use both
be culturally assessment is indigenous
sensitive and meaningfully (culture-specific
relevant adapted for use emic) and
in the appropriately
multicultural SA adapted
context, and a Western-
wealth of oriented
international measures (that
research on the tap universal
measure can be constructs; etic)
drawn on This combined
approach
provides
opportunities to
advance
African-centred
psychological
assessment
while also
contributing to
and shaping th
global disciplin
of psychologica
assessment

Challenges • Africa and SA • Even • We lack


have many although the sufficient
cultures. Do content of expertise an
separate the measure the financial
measures is more resources
have to be culturally, needed to
developed linguistically develop
for each and indigenous
culture? contextually measures
• This appropriate, and adapt
approach the adapted internationa
limits cross- measure measures
cultural remains (especially
comparability rooted in the for children)
studies Western- This slows
within and oriented down the
across theory and pace at
countries as research that which
the measures informed its culturally
are not construct appropriate
comparable domains and measures
• It takes time content. can be
and is costly • These provided.
to develop a theories and • There is a
new test research do paucity of
not African-
necessarily centred
completely theories of
‘fit’ the SA intelligence,
context and personality,
require etc. Such
critical theories are
interrogation needed to
for the inform both
results to be indigenous
interpreted test
appropriately developmen
as well as
when
adapting
existing test
using and
integrated
emic-etic
approach

Let’s briefly critically reflect on the specific progress being made in


both developing indigenous measures and adapting Western-oriented
measures for the South African context.

18.2.2 Progress in developing indigenous, African-


centred measures
In Chapter 1 you were requested to keep notes on progress being
made to provide more measures that are African-centred as you
journeyed through this book. Look back at your notes. Examples of
some of the African-centred measures developed since the start of the
twenty-first century are summarised in Table 18.2. This is not an
exhaustive list. You might want to add to it as you read Chapters 10 to
15 again.
Reflecting on the psychological measures highlighted in Table
18.2, the following themes emerge:
Table 18.2 Examples of indigenous, African-centred measures

CONSTRUCT NAME OF MEASURE COMMENT


DOMAIN
Personality South African Personality An emic-driven, lexical
Inventory (SAPI) – for approach was used to
adults identify personality
(see 12.3.9 in Chapter 12) descriptors and then
common personality
dimensions across cultural
and language groups were
derived for the SA context.
Items are culturally
appropriate. The SAPI is
setting the benchmark
internationally for
developing culturally
appropriate measures.

Basic Traits Inventory (BTI) Based on the Big Five


− for adults and personality factors (for
adolescents; paper-based which there is universal
and online versions support; etic); items
(see 12.3.3 in Chapter 12) developed for the
multicultural and
multilingual SA context
(emic-driven).

Cognitive Learning Potential Based on principles of


Computerised Adaptive dynamic assessment and
Test (LPCAT) – for children theories of learning
from 11 years and potential. Items developed
upwards, including for the multicultural and
students multilingual SA context
(see 10.2.1 and 10.4.2.2 in and available in the 11
Chapter 10) official languages. It is a
computerised test based
on adaptive testing
principles (no paper-based
version).

Specific Parallel versions of Culturally sensitive content


cognitive indigenous short-term was found to have a
functions − memory tasks for positive impact on test
memory Afrikaans- and Setswana- performance, especially for
speaking cultural groups more complex cognitive
where some of the content tasks. Developed for
differs across the groups to research purposes but has
allow for greater familiarity promise to be used in
while still tapping the same applied contexts.
construct (see 10.7 in
Chapter 10)

Career − South African Career Based on Holland’s circular


Interests Interest Inventory (SACII) model of vocational
(see 13.2.3 in Chapter 13) interests (universal
support) but items were
developed for the cross-
cultural SA context.
Psychometric results
indicate that more
appropriate assessment
can be achieved from
developing an indigenous
SA measure of Holland’s
theory than adapting the
new version of the theory-
based SDS that was
developed in the USA.

My System of Career InfluencesBased on Systems Theory


(MSCI) – qualitative measure Framework – cross-culturally
for adults and adolescents (see applicable as MSCI was
13.4.1 in Chapter 13) designed for use across
cultural and national
groups with SA being one
of the contexts where the
development took place

Well-being Wellness Questionnaire for Based on a Western theory


Higher Education (WQHE) – (Hettler’s six-dimensional
for university students model of wellness), but the
(see 11.5.1 in Chapter 11) items were developed for
the multicultural and
multlingual SA student
context. Latest research
suggests insufficient
construct equivalence
across groups.
Consequently, a study is
underway to evolve an
understanding of wellness
in SA that can inform the
dimensions and content of
a revised version of the
WQHE.

South African Burnout Draws on three dimensions


Scale – for adults of burnout in international
(see 11.5.2 in Chapter 11) literature, but the items
were developed for the
multicultural and
multilingual SA context.
Initial development in an
agricultural context.

There is value in using a combined emic-etic approach in using


universal domains or theories and then developing items that are
culturally and linguistically appropriate (e.g. BTI, LPCAT, SACII, South
African Burnout Scale, WQHE).
Equally, there is value in developing a contextualised understanding
or theory in the SA context before developing the dimensions and
items of a psychological assessment measure (e.g. SAPI).
In some instances, instead of adapting an international measure, a
more appropriate and psychometrically sound measure can be
developed for the SA context based on the same theory. For example,
both the latest international version of the Self-directed Search and
the indigenously developed SA Interest Inventory (SAII) are based on
Holland’s circular model of vocational interests. There is evidence that
the SAII provides findings that are better aligned to the theory than
those of the previously adapted version of the SDS in the SA context.
The option of creating parallel versions of indigenous cognitive
measures for different cultural or language groups, where some of the
content differs to allow for greater familiarity while still tapping the
same construct, is an intriguing one (e.g. Malda, Van de Vijver &
Temane, 2010). This approach could enhance the cultural-
meaningfulness of measures and ethical assessment practice in SA.
Where international measures are specifically developed for cross-
cultural and cross-national use, and SA researchers are involved in
the development phase, provides another approach to providing
meaningful assessment measures in SA (e.g. MSCI).
While the majority of indigenous measures available are quantitative
in nature, measures with a more qualitative orientation show promise
in cross-cultural settings (e.g. MSCI).
Few indigenous computer-based measures have been developed
(e.g. LPCAT), while more questionnaires and inventories have both
computer- and paper-based versions. There is little evidence about
delivering tests to mobile devices.
Most of the indigenous measures developed in the democratic South
Africa are for adults and adolescents. Very few cognitive and
developmental tests have been developed for young children.
18.2.3 Progress in adapting and researching Western-
oriented measures for the South African context
There is an encouraging, growing trend of published research studies
in which foreign or imported measures are researched in a range of
settings (clinical, neuropsychological, educational, organisational). For
example, in a special issue of the South African Journal of
Psychology, 42(3), eight articles were devoted to providing
psychometric and research data on imported tests such as the Tinker
Toy Test, the Wisconsin Card Sorting Test, the Stroop Color and
Word Test and the WAIS III and IV, which are used in a
neuropsychological context. The current upswing in the publication of
research into the cultural appropriateness of measures in scientific
journals, and via the electronic media, should continue to be
encouraged.
Examples of measures where psychometric research yields
promising findings in the multicultural and multilingual local context,
but where content or construct domains have not yet been adapted or
national norms established (if appropriate), include:
Raven’s Advanced Progressive Matrices
• Well-being measures: Depression, Anxiety and Stress Scale
(DASS), Beck Depression Inventory
Examples of some of the Western-oriented measures transported
from elsewhere and adapted for use in SA since the start of the
twenty-first century are summarised in Table 18.3. This is not an
exhaustive list. You might want to add to it as you read Chapters 10 to
15 again.
Table 18.3 Examples of adapted, Western-oriented measures and assessment procedures

CONSTRUCT NAME OF MEASURE COMMENT


DOMAIN/ASPECT
ADAPTED

Personality 16 Personality Factor Studies of the SA


Questionnaire – Fifth adaptation of the 16PF5
Edition (16PF5) with various language
(see 12.3.1 in Chapter 12) and cultural groups
suggests that it
measures the same
broad personality traits
across the different
groups.

Cognitive WAIS-IVSA (16 years to 59 Requires Grade 12 level


years and 11 months) of English proficiency.
(see 10.4.1.1 and 10.4.1.2 Norms are applicable for
in Chapter 10) English second-language
speakers. A criticism is
that quality of education
is a strong moderating
variable and the norms
do not address this.

Well-being Centre for Epidemiologic Afrikaans, isiZulu, and


Studies Depression Scale isiXhosa language
(CES-D) versions of a 10-item
(see 11.5.1 in Chapter 11) CES-D have been
established and
validated.

Administration Rorschach Inkblot test Adjusted the seating


procedures (see 9.3.3 in Chapter 9) arrangements so that
children could choose
the one they were more
comfortable with.

Response Rorschach – expanded Allowed test-takers to


possibilities the ways in which a respond in the language
response could be given they are comfortable
(see 12.4.3 in Chapter 12) with and using drawings
to enhance their
description of what they
‘saw’ in the inkblot. This
increased the number of
responses, making a
richer interpretation
possible

Reflecting on the psychological measures highlighted in Table 18.3,


the following themes emerge:
Adapting some versions of an international measure yields better
psychometric results than others. For example, the 16PF5 that was
adapted for the SA context appears to be measuring the same
constructs across cultural groups whereas this was not the case for
adaptations of previous versions of the 16PF (see 12.3.1 in Chapter
12).
The matter of how to delineate norm groups and how to account for
moderator variables impacting on test performance remains a thorny
issue in a country where test-takers are from diverse language,
cultural, socio-economic, and educational backgrounds. This will be
discussed further in Section 18.3.
The diversity of the SA population also poses challenges to adapt a
psychological measure for all the different language and cultural
groups at the same time. This is very resource- and time-intensive.
Consequently, most test developers tackle a few languages and
cultural groups at a time (e.g. the CES-D has been adapted for three
language groups).
• As pointed out in Chapter 7, test adaptation includes aspects of
how the measure is administered and how the test-taker must
respond. Limited research into these aspects is available.
Nonetheless, the work done by Motlesane (2004) and Moletsane
and Eloff (2006) in terms of the Adjusted Rorschach
Comprehensive Procedures that they developed provides insight
into the importance of researching and adjusting seating
arrangements and response possibilities, which resulted in a more
meaningful assessment experience and a richer set of assessment
responses.

❱❱ CRITICAL THINKING CHALLENGE 18.1


Brainstorm with a friend or classmate how you would make a decision as to
whether to develop an indigenous psychological assessment measure or
adapt an existing international measure. What will inform your decision?

18.3Roadmap for the next phase of


the journey
Figure 18.1 outlines three core aspects that form part of the next
phase of the journey to enhance and expand culturally meaningful
assessment in South Africa so that it becomes the norm.
Each of these aspects will be briefly discussed below.

18.3.1 Culturally appropriate measures


Many more culturally appropriate measures need to be added to the
list of classified psychological tests published by the Professional
Board for Psychology. If not, the transformation of psychological
assessment will be slowed down and even halted. What are some of
the things that can be done to sustain and increase the pace of
developing and adapting measures that are culturally appropriate?
Some ideas in this regard are provided below.

Figure 18.1 Core aspects needed to expand culturally meaningful assessment in South Africa
18.3.1.1 Routes to developing indigenous measures and
adapting Western-oriented measures
A common understanding has generally been reached that the best
approach to create a pool of culturally appropriate measures is to both
develop indigenous, African-centred measures and adapt appropriate
Western-oriented measures for the South African context. How do we
set about undertaking such development and adaptation?
As was pointed out in Chapters 6 and 7, whether developing an
indigenous measure or adapting an imported, international one, a
multidisciplinary team needs to be assembled, which should include
cultural experts from the communities concerned as well as language
experts.
Chapter 6 provides a detailed account of how to develop a
psychological assessment measure, while Section 7.7 in Chapter 7
provides more depth regarding the development of tests for a
multicultural context. But you now have a broader understanding of
etic and emic approaches in cross-cultural assessment. One of two
routes can be followed when using a combined emic-etic approach to
conceptualise and delineate the construct domain of an indigenous
test in the development phase. The first route is termed ‘etic-driven’
(Iliescu & Ispas, 2016, p. 137). Here a universal conceptualisation of
the construct domain is applied to which emic, culture-specific traits or
sub-domains are added. The second route is termed ‘emic-driven’
(Iliescu & Ispas, 2016, p.137). Following this route requires that the
construct domain of interest be explored ‘from the ground up’ so that a
culture-specific (emic) understanding of a construct develops. The
resultant structure that emerges of the construct can then be
compared to structures of the construct that are thought to be
universal (etic). This provides insight into aspects of the construct that
are similar across cultures and those that are unique.
When figuring out which constructs to tap in an indigenous
measure, the findings of research studies could also prove useful. For
example, Cockroft, Bloch, and Moolla (2016) compared the
performance of South African school beginners from high and low
socio-economic groups on working memory and vocabulary
measures. They found that ‘socio-economic status accounts for
considerable variance in vocabulary measures, while it explains only
small amounts of variance in working memory measures’ (p. 199).
Thus, when developing an indigenous cognitive test battery, including
working memory as one of the sub-domains can result in a fairer
assessment if children from differing socio-economic groups are to be
assessed on the battery.
In Table 7.1 in Chapter 7, eight kinds of test adaptation were
indicated (Van de Vijver, 2016). When adapting a measure for use in
the multicultural and multilingual South African context, the first step is
to decide which types of adaptation will be needed. It is considered
good practice to include a combination of types of adaptation (Van de
Vijver, 2016). The subsequent steps to adapt a measure were outlined
in Chapter 7 (especially in 7.3.1 and 7.6). Readers are also referred to
Van de Vijver (2016) and the International Test Commission’s
Guidelines for Translating and Adapting Tests − 2nd edition
(International Test Commission, 2016) for further guidance on how to
adapt a measure.
Whether an indigenous measure is to be developed or an imported
one adapted, progress needs to be made regarding assessment
measures being available in multiple languages. There is growing
evidence that more measures are being adapted and translated,
which results in different language versions of a measure being
available. But to have 11 different language versions of a measure
available is quite challenging, costly, and time-consuming. Given that
many South African children are educated in a language that is not
their mother tongue, the possibility of bi- and even tri-lingual
assessment should be explored (see 9.2.7 in Chapter 9).
An innovation that has been seen in the research arena has been
the development of parallel versions of indigenous cognitive measures
for different cultural or language groups. The same construct is
measured, but some of the content differs to allow for greater
familiarity for each specific group, and scores are standardised across
the groups to allow for direct comparisons (Malda, Van de Vijver &
Temane, 2010). Given the multicultural and multilingual nature of our
country, it is hoped that parallel versions of indigenous measures will
be developed for use in applied clinical, counselling, educational, and
organisational contexts as such assessment will be fairer and more
culturally meaningful.
18.3.1.2 Inspiration for African-centred item development
An aspect that can be a challenge in indigenous test development is
where to seek inspiration from for developing African-centred item
content. Some ideas that surfaced in this regard during our
assessment journey in this book are:
Turn to African artwork to develop geometric shapes used in problem-
solving cognitive assessment tasks (Bekwa, 2016; Bekwa & De Beer,
2012) (see Figure 10.1 in Chapter 10).
Study the context so that appropriate emic tasks can be developed.
For example, Reuning and Wortley’s (1973) study of San in the
Kalahari Desert resulted in the development of a three-dimensional
maze presented on a wooden board that had to be solved. This was a
more familiar task than the unfamiliar one of solving a paper-based
maze using a pencil (see 2.4.3 in Chapter 2).
Instead of using two-dimensional, visually represented patterns on a
page, use familiar three-dimensional objects such as beans, stones,
and caps from bottles that can be packed out into patterns that need
to be completed (e.g. Zuilkowski, et al., 2016) (see 6.2.2.1 in Chapter
6).
Where drawing something using pencil and paper is unfamiliar,
drawing something using a stick on wet clay might be a more familiar
task, as could be producing an object visually by making it out of clay
or string rather than drawing it (e.g. Mpofu, 2002) (see 15.3.4.1 in
Chapter 15).
Activities undertaken in therapy can take on the form of an
assessment technique. For example, creative arts such as painting,
using clay, music, dance and movement, and using sand-trays and
sand play, or using the Mmogo-method® (Roos, 2008, 2012) provide
opportunities to gain insight into a child’s behaviour and level of
functioning (see 15.3.4.5 in Chapter 15). Other than the insights
gained, it is possible that these creative activities can spark ideas for
African-centred item development. But we have to find ways to
harvest these ideas. Perhaps test-development teams could consult
therapists who use these creative activities during therapy to be
inspired and develop ideas that can be translated into items?
In career-construction and life-design counselling, clients develop
stories and narratives about themselves based on prompting
questions and tasks. From this, the counsellor and client then ‘co-
construct a life-career portrait’ (e.g., Hartung, 2015, p. 116) (see
13.5.1 in Chapter 13). Given the strong oral tradition in African
cultures, test tasks that require storytelling and even listening to
stories could prove to be meaningful and familiar activities.
18.3.1.3 Delineating norm groups
The best way or ways to delineate and compile norm groups must be
found (Grieve & Van Eeden, 1997; Huysamen, 1986; Nell, 1997;
Schepers, 1999). Grieve and Van Eeden (1997) suggest that the
factors that moderate test performance in a heterogeneous society
provide a useful starting point when trying to identify homogenous
norm groups. Factors such as language proficiency, cultural
background, extent of urbanisation, socio-economic level, extent and
quality of formal education, as well as ‘test-wiseness’ have been found
to be important moderator variables on performance in psychological
measures in South Africa (Nell, 1997; Shuttleworth-Edwards, 2017).
Huysamen (1986) furthermore argues that, other than developing
separate norms for different target groups, which is fairly costly, the
other route would be to select only those items that do not unfairly
discriminate between the different groups and to then use only one
(common) norm group. In this instance, the more unique aspects of a
particular group may be discarded in favour of what they have in
common with other groups. Another potential solution is to use the
typical performance of particular subgroups of the larger population as
a criterion or standard against which a test-taker’s test performance
can be compared (Raiford, Weiss, Rolfhus & Coalson, 2005, 2008).
The outcome of the comparison could assist in reaching a diagnosis
or developing an intervention. The Advanced Reading block in Section
10.4.1.2 in Chapter 10 problematises the matter of delineating norm
groups in a multicultural context.
18.3.1.4 Qualitative measures
There is a strong call for the development and use of qualitative
measures as an alternative approach to quantitative, psychometric-
based psychological assessment. While some of these calls come
from those who focus on psychotherapy, some innovative work has
been done in this regard in the field of career counselling (see Section
13.5 in Chapter 13). Getting clients to narrate career stories or to
reflect on career influences in structured ways while still allowing for
flexibility has resulted in a range of qualitative career-assessment
measures (McMahon et al., 2003). Given the structured approached, it
is possible to use the same assessment tasks and activities with
different clients, and the outcomes or outputs (e.g. stories) lend
themselves to being used as research data. Going forward, capacity-
building is needed in structuring or systematising qualitative
assessment tasks to produce qualitative assessment measures. In
this regard, McMahon et al. (2003) provide suggestions to guide the
development of qualitative career-assessment measures (see Section
13.4.1 in Chapter 13). A further matter that will require attention is the
standards or requirements that qualitative assessment measures must
meet. Some possible minimum requirements could be:
The identification and delineation of the constuct(s) tapped by the
measure needs to be evaluated and found to be credible and
trustworthy by a panel of experts. This provides qualitative ‘evidence’
of construct validity.
Sensitivity reviews of items or tasks, as well as of the process of
presenting and responding to the assessment tasks for designated
subgroups of interest (DSI), needs to be undertaken using expert
panels and focus groups with samples from the target population. This
should ensure that the assessment tasks or activities are not biased
against any of the DSIs. The usual procedures used in qualitative
research to establish the trustworthiness and consistency of the
findings from the sensitivity review should be used. Consult Aston
(2006) for an example of conducting a sensitivity review of the WAIS-
III.
• Guidelines for making sense of the output or artefact produced and
assisting clients to ‘connect the dots’ and, in so doing, co-
developing a holistic sense of their development, career pathways,
and so on, could be helpful.
What other requirements can you think of?
18.3.1.5 Creating repositories of research studies into
assessment measures While progress is being made, it is still on a
relatively small scale as researchers often focus on locally/regionally
available samples and adapt tests for cultures and language groups in
a region. To derive maximum benefit from these initiatives, cross-
cultural research databases on commonly used South African
measures should be established and Web sites should be developed
to assist researchers to gain access to these databases easily. In this
way, information on specific psychological measures from research
studies across the country can be pooled.
It should be noted that if such repositories are to be created, the
informed consent obtained by psychological assessment practitioners
from test-takers must include the use of assessment results for
research purposes. Test-takers will need to be assured that their
assessment data will remain anonymous in this case. This can be
done by not including the test-taker’s name, identification number, and
address in the repository.

18.3.2 Capacity-building in test development


and adaptation
There are currently too few expert test developers in South Africa.
Why? One answer is that this is not a recognised category in the
profession of psychology. Consequently, while some postgraduate
programmes in research psychology include modules in developing
psychological assessment measures, there is no programme that
exclusively focuses on developing the next generation of test
developers in South Africa.
Given the contribution that psychological assessment can
potentially make towards the effective utilisation of human resources
in South African companies (against the background of South Africa’s
poor rating in the World Economic Forum’s Global Competitiveness
Report), South Africa needs more test development and measurement
experts. The science of psychological assessment (psychometric
theory) can be applied beyond the psychometric assessment of
individual attributes in the narrow traditional sense. Psychometric
theory can also be applied to the assessment of work groups,
business organisations, and the outcomes of training and education
programmes. A concerted effort should be made by all departments of
psychology (general, clinical, counselling, industrial, educational, and
research) at our local universities to develop and establish expertise in
test development. Furthermore, in view of the fact that computerised
measures are increasingly going to dominate the field of assessment,
interdisciplinary training programmes offered by psychology
departments, in collaboration with computer science and information
technology departments, are becoming a necessity if we want to
develop expertise in computerised test development.
As assessment methodologies and technologies as well as
advanced statistical techniques have evolved rapidly over the last
decade or two, Byrne et al., (2009) argue that knowledge and applied
learning related to these aspects should be included in postgraduate
psychology curricula. They also highlight the inclusion of topics related
to test equivalence, bias, normative equivalence, indigenous test
development, the use of imported tests and test adaptation, and
multinational test use, especially in industrial and organisational
psychology curricula.
A question that arises is: Do psychology departments have
academics with sufficient psychometric expertise to teach the topics
that Byrne et al., (2009) propose? The answer is probably that there is
not sufficient expertise at all universities with postgraduate psychology
programmes. One solution could be to create a pool of test-
development and psychometric experts who can teach modules in this
field so that universities can draw on them. Another solution is that, as
the various test publishing agencies in the country have research and
development sections, some of these psychometric researchers and
test developers could be contracted to teach modules in test
development and adaptation, and psychometric research at
universities. Test publishing agencies could also consider providing
internships for postgraduate students specialising in the development
of psychological assessment measures.

18.3.3 Foster good ethical practices and address


assessment misuse
As the misuse of assessment measures is a worldwide concern
(Bartram & Coyne, 1998), ways of addressing this problem need to be
found. Both in South Africa and internationally, there is a strong
perception that while legal regulations need to be in place for
purposes of gross or severe abuse, the introduction of more
regulations will not stamp out abuse in the long run. As we pointed out
in Chapter 8, the onus needs to be placed on the assessment
practitioner to ensure that ethical assessment practices become the
norm. How can this be achieved? Let us explore a few possibilities in
this regard.
Awareness about good assessment practices and assessment
misuse for assessment practitioners can be developed through:
providing guidelines. The International Test Commission’s
International Guidelines for Test-use (Version 2000), the ITC’s
Guidelines for Computer-based and Internet-delivered Testing (2006),
the Ethical Rules of Conduct (Department of Health 2006), the Code
of Practice for Psychological and Other Similar Assessment in the
Workplace (Society for Industrial and Organisational Psychology in
South Africa, 2006a) and the Guidelines for the Validation of
Assessment Procedures (Society for Industrial and Organisational
Psychology in South Africa, 2006b) are examples of guidelines that
are available to guide practice standards for assessment practitioners
and other stakeholders in South Africa. These guidelines should be
included in the training of assessment practitioners and should be
highlighted in CPD workshops
gathering a collection of case studies that reflect instances of good
and poor assessment practices and test misuse (Robertson & Eyde,
1993). These case studies could be captured digitally and there could
be online interactions and debate around them
developing and maintaining dedicated web sites where information
about psychological measures, issues in assessment, and best
practices is available (e.g. www.psychtesting.org.uk and
http://www.apa.org). This should not only provide practitioners with
information, but also the public (e.g., what are the main features of a
psychological test; what are the rights and responsibilities of test-
takers)
undertaking more research studies into ethical assessment practices
in a multicultural and multilingual, African context and publishing the
findings (Lahar and Cockcroft, 2014). This will provide students and
assessment practitioners with greater insight into ethical assessment
practices
university training programmes need to expose assessment
practitioners to both the science as well as the ‘art’ of assessment so
that they function as clinicians and not just technicians (Ahmed &
Pillay, 2004). While the use of traditional assessment measures
represents the more quantitative component of assessment, the broad
gathering of relevant information and the synthesis and integration of
all assessment information to describe and understand human
functioning represents the ‘art’ of assessment. Furthermore, given that
assessment is both an ‘art’ and a science, the lecturer/trainer needs to
be conversant with the theoretical and psychometric underpinnings of
assessment, as well as being a practitioner who can use real-life
examples and experiences to highlight professional, ethical practice
aspects. Thus, those who teach assessment modules in professional
training programmes need to be experienced practitioners themselves
with a broad assessment knowledge base
exposure to ethical practices when working in multicultural contexts
and developing cultural sensitivity is essential. This should ideally be
done during undergraduate and postgraduate studies in psychology.
CPD sessions could also focus on this
instituting ‘[t]own hall focus group discussions … to explore selected
ethical challenges experienced by psychologists conducting
assessments’ (Levin & Buckett, 2011, p. 4). Not only does this provide
a safe space to dialogue but potential solutions or ways of managing
difficult ethical dilemmas can also be generated in the process. CPD
sessions and roundtable discussions at conferences provide
opportunities for such town hall focus group discussions
• regular supervision and practitioner support groups provide
opportunities to discuss and share experiences around ethical
challenges and test misuse in assessment.

❱❱ ETHICAL DILEMMA CASE STUDY 18.1

Cyril is a clinical psychologist in private practice. He is mindful of the fact that


information needs to be gathered to validate and update the psychological
measures that he uses, particularly the imported ones where he has translated
the items for the language groups that his practice serves. Consequently, he
creates a database which includes the name of the client and the relevant test
results. In his informed consent contract with his client he only includes
information on the purpose of the assessment, how feedback will be provided,
and how confidentiality of the results will be maintained.

Questions
(a) Under what circumstances is it ethically acceptable to use an
imported test in clinical assessment practice?
(b) Can psychological test data be used for research purposes
without the consent of the client? Motivate your answer.
(c) If Cyril submits his database to a national project aimed at pooling research
studies into specific tests across the country, what should he do to ensure that
he does not breach the confidentiality assurance that he gave to his clients?

18.4 Conclusion
When he was freed from his lengthy period of imprisonment, the late
Nelson Mandela knew that the long journey to a democratic South
Africa was not yet over. He thus had the following to say: ‘The long
walk continues’ (Hatang & Venter, 2011, p. 117).
In this chapter we have reflected on the progress that we have
made in transforming psychological assessment in South Africa to
become more culturally meaningful. While steady progress is being
made, there is undoubtedly still a long road ahead. We have outlined
some of the core aspects that need to be tackled in the next phase of
the journey, some of which are quite daunting. However, having taken
up the challenge of transforming psychological assessment in South
Africa, we cannot stop now. The long walk has to continue until
psychological assessment is perceived as being meaningful, relevant,
and of value for all cultural groups in South Africa.
But, for the moment, we have reached the end of our journey into
the world of psychological assessment in this book. We hope that you
have enjoyed the journey and that you share our enthusiasm for the
valuable role that psychological assessment can play in assisting
individuals, organisations, educational and training institutions, and
the broader South African society. However, given the sensitive nature
of psychological assessment and the potential for misuse, we hope
that this book has provided you with some of the more important
signposts regarding fair and ethical assessment practices in the
multicultural and multilingual South African context. When you begin,
or continue, your real-life journey as an assessment practitioner, we
hope that the lessons that you have learned will transcend ‘book
knowledge’ and will make a difference to the way in which you
practise psychological assessment and contribute to its transformation
in South Africa.
References
Abrahams, F. (1996). The cross-cultural comparability of the 16
Personality Factor Inventory (16PF). Unpublished doctoral
dissertation, University of South Africa.
Abrahams, F. (2002, December). The (un)fair usage of the 16PF
(SA92) in South Africa: A response to C.H. Prinsloo and I.
Ebersohn. South African Journal of Psychology, 32(3), 58−61.
Abrahams, F. & Mauer, K.F. (1999a). The comparability of the
constructs of the 16PF in the South African Context. SA Journal of
Industrial Psychology, 25(1), pp. 53–59.
Abrahams, F. & Mauer, K.F. (1999b). Qualitative and statistical
impacts of home language on responses to the items of the 16
Personality Factor Questionnaire (16PF) in South Africa. SA
Journal of Psychology, 29, pp. 76–86.
Abubakar, A., Holding, P., Van de Vijver, F., Bomu, G., & Van Baar, A.
(2010). Developmental monitoring using caregiver reports in a
resource-limited setting: The case of Kilifi, Kenya. Foundation Acta
Pædiatrica/Acta Pædiatrica, 99(2), pp. 291–297.
doi:10.1111/j.1651-2227.2009.01561.x
Adler, D.A., McLaughlin, T.J. & Rogers, W.H. (2006). Job
performance deficits due to depression. The American Journal of
Psychiatry, 163(9), pp. 1569–1576.
Aguinis, H. (2013). Performance Management (3rd edition).
Englewood Cliffs, NJ: Prentice-Hall.
Aguinis, H., Culpepper, S. A., & Pierce, C.A. (2010). Revival of test
bias research in preemployment testing. Journal of Applied
Psychology, 95(4), pp. 648–680. doi: 10.1037/a0018714.
Aguinis, H., & Stone-Romero, E.F. (1997). Methodological artifacts in
moderated multiple regression and their effects on statistical
power. Journal of Applied Psychology, 82, pp. 192–206.
Ahmed, R. & Pillay, A.L. (2004). Reviewing clinical psychology training
in the post-apartheid period: Have we made any progress? SA
Journal of Psychology, 34(4), pp. 630–656.
Aiken, L.R. & Groth-Marnat, G. (2005). Psychological testing and
assessment (12th edition). Massachusetts: Allyn and Bacon, Inc.
Ajayi, O.R., Matthews, G.B., Taylor, M., Kvalsvig, J.D., Davidson, L.,
Kauchali, S. & Mellins, C. (2017). Structural equation modeling of
the effects of family, preschool, and stunting on the cognitive
development of school children. Front. Nutr, May 15, 4:17. doi:
10.3389/fnut.2017.00017
Allen, M.J. & Yen, W.M. (1979). Introduction to measurement theory.
Brooks/Cole Publishing Company: Monterey, California.
Aloia, M., Arnedt, J., Davis, J., Riggs, L., & Byrd, D. (2004).
Neuropsychological sequelae of obstructive sleep apnea-
hypopnea syndrome: A critical review. Journal of the International
Neuropsychological Society, 10(5), pp. 772–785.
Alonso-Arbiol, I. & Van de Vijver, F.J.R (2010). A historical analysis of
European Journal of Psychological Assessment. European Journal
of Psychological Assessment, 26(4), pp. 238–247.
Alpern, G.D., Boll, T.J., & Shearer, M. (1980). The Developmental
Profile II. Aspen, CO: Psychological Development Publications.
Aluisio, S., Specia, L., Gasperin, C. & Scarton, C. (2010). Readability
assessment for text simplification. Proceedings of the NAACL HLT
2010 Fifth Workshop on Innovative Use of NLP for Building
Educational Applications, pages 1–9, Los Angeles, California, June
2010. Association for Computational Linguistics. Retrieved on
15 July 2012 from
http://rer.sagepub.com/content/74/1/1.full.pdf+html
Aluja, A., & Blanch, A. (2004) Replicability of first-order 16PF-5
factors: An analysis of three parcelling methods. Personality and
Individual Differences, 37, pp. 667–677.
American Educational Research Association, American Psychological
Association, & National Council for Measurement in Education.
(1999). Standards for educational and psychological testing.
Washington, D.C., NY: Author.
American Psychological Association. (2003). Principles for the
validation and use of personnel selection procedures (4th ed.).
Washington, D.C., NY: Author.
Anastasi, A. (1986). Evolving concepts of test validation. Annual
Review of Psychology, 37, pp. 1–15.
Anastasi, A. & Urbina, S. (1997). Psychological testing (7th edition).
Upper Saddle River, NJ: Prentice-Hall.
Angoff, W.H. (1993). Perspectives on differential item functioning
methodology. In P.W. Holland & H. Wainer (Eds.), Differential item
functioning (pp. 3–23). Hillside, NJ: Lawrence Erlbaum
Antonovsky, A. (1979). Health, stress and coping. San Francisco:
Jossey-Bass.
Antonovsky, A. (1987). Unravelling the mystery of health: How people
manage stress and stay well. San Francisco: Jossey-Bass.
Antonovsky, A. (1996). The salutogenic model as a theory to guide
health promotion. Health Promotion International, 11(1), pp. 11–
18.
Argyle & Crossland, (1987). Dimensions of positive emotions. British
Journal of Social Psychology, 26, pp. 127–137
Arthur, W., Day, E.A., Mcnelly, T.L. & Edens, P.S. (2003). A meta‐
analysis of the criterion‐related validity of assessment center
dimensions. Personnel Psychology, 56(1), pp. 125−154.
doi:10.1111/j.1744-6570.2003.tb00146.x
Arthur, W., Doverspike, D., Muňoz, J., Taylor, J.E. & Carr, A.E.
(2014). The use of mobile devices in high-stakes remotely
delivered assessments and testing. International Journal of
Selection and Assessment, 22(2), pp. 113−123.
Austin, J.T., & Villanova, P. (1992). The criterion problem: 1917–1992.
Journal of Applied Psychology, 77, pp. 836–874.
Asiwe, D.N., Jorgensen, L.I., & Hill, C. (2014). The development and
investigation of the psychometric properties of a burnout scale
within a South African agricultural research institution. SA Journal
of Industrial Psychology, 40(1), Art. #1194, af pages. doi:
http://dx.doi.org/10.4102/sajip.v40i1.1194
Association of Test Publishers of South Africa and Another, Saville
and Holdsworth South Africa v The Chairperson of the
Professional Board of Psychology, unreported case number
12942/09 North Gauteng High Court, 2010 (p. 1–22). Retrieved on
21 November 2012 from
http://atp.org.za/assets/files/JUDGMENT.PDF.
Aston, S. (2006). A qualitative bias review of the adaptation of the
WAIS-III for English-speaking South Africans. Unpublished
master’s dissertation, Nelson Mandela Metropolitan University.
Retrieved on 13 September 2018 from
http://hdl.handle.net/20.500.11892/159016
ATP (Association of Test Publishers of South Africa) v the President
of SA, the Minister of Labour and the HPCSA, case number 89564
(North Gauteng High Court, May 2, 2017).
Badenhorst, F.H. (1986). Die bruikbaarheid van die Senior Suid-
Afrikaanse Indivduale Skaal vir die evaluering van Blanke
Afrikaanssprekende, hardhorende kinders (The suitability of the
Senior South African Indivdual Scale for evaluating white
Afrikaans-speaking children who are hard of hearing). Unpublished
master’s dissertation, University of Stellenbosch.
Bagnato, S.J. & Neisworth, J.T. (1991). Assessment for early
intervention: Best practices for professionals. New York: The
Guilford Press.
Baicker, K., Cutler, D., & Song, Z. (2010). Workplace wellness
programs can generate savings. Health Affairs, 29(2), pp. 304–
311.
Bakker, A. B., & Demerouti, E. (2007). The job demands-resources
model: State of the art. Journal of Managerial Psychology, 22(3),
pp. 309–328.
Bakker, A., & Demerouti, E. (2008). Towards a model of work
engagement. Career Development Journal, 13, pp. 209–223.
Bakker, A.B. & Leiter, M.P. (2010). Work engagement: A handbook of
essential theory and research. Hove: Psychology Press.
Bakker, A.B., & Oerlemans, W.G.M. (2010). Subjective well-being in
organizations. In Cameron, K., & Spreitzer, G. (Eds.). Handbook of
positive organizational scholarship (pp. 1–31). Oxford University
Press.
Baran, S. (1972). Projective Personality Test/Projektiewe Persoonlik-
heidstoets. Johannesburg: National Institute for Personnel
Research.
Barnes, H., Hall, K., Sambub, W., Wright, G. & Zembe-Mkabile, W.
(2017). Review of research evidence on child poverty in South
Africa. Children’s Institute, University of Cape Town. Retrieved on
13 September 2018 from
http://www.ci.uct.ac.za/sites/default/files/image_tool/images/367/pu
Baron, E. C., Davies, T., & Lund, C. (2017). Validation of the 10-item
Centre for Epidemiological Studies Depression Scale (CES-D-10) in
Zulu, Xhosa and Afrikaans populations in South Africa. BMC
Psychiatry, 17(1), 6.
Bar-on, R. (1997). BarOn Emotional Quotient Inventory Technical
Manual. Toronto: Multi-Health Systems Inc.
Barry, C.L., Horst, S.J., Finney, S.J., Brown, A.R., & Kopp, J.P.
(2010). Do examinees have similar test-taking effort? A high-
stakes question for low-stakes testing. International Journal of
Testing, 10(4), pp. 342–363.
Bartram, D. (2003). Policy on the design of test-related products for
executive users (Version 1.0). London: SHL.
Bartram, D. (2004). Assessment in organisations. Applied Psychology:
An International Review, 53(2), pp. 237−259.
Bartram, D. (2008a). Global versus local personality norms: Logical
and methodological issues. Paper presented at the 6th Conference
of the International Test Commission, Liverpool, UK, 14–16 July
2008. (Can be accessed online at www.intestcom.org).
Bartram, D. (2008b). An ISO standard for assessment in work and
organizational settings. In J. Bogg (Ed.), Testing International, 20,
December 2008, pp. 9–10. (Can be accessed online at
www.intestcom.org).
Bartram, D. (2012). Concluding thoughts on the internationalization of
test reviews. International Journal of Psychological Testing, 12, pp.
195–201.
Bartram, D. & Coyne, I. (1998). The ITC/EFPPA survey of testing and
test use in countries world-wide: Narrative report. Unpublished
manuscript, University of Hull, United Kingdom.
Bass, C. & Fleury, D. (2010). The (not so) hidden costs of member
absences. Slide presentation at the meeting of Global Benefits
Outsourcing Conference 2011, Mercer.
Bayley, N. (1969). Manual for the Bayley Scales of Infant
Development. New York: The Psychological Corporation.
Beck, A.T., Steer, R.A. & Brown, G.K. (1996). BDI II Manual. Boston:
The Psychological Corporation.
Becker, J. (2013). Preliminary psychometric investigation of the
Minnesota Multiphasic Personality Inventory-2TM in the South
African context. Johanneburg, South Africa: JvR Psychometrics.
Beckham, J., Crawford, A., & Feldman, M. (1998). Trail Making Test
performance in Vietnam combat veterans with and without post
traumatic stress disorder. Journal of Traumatic Stress, 11(4), pp.
811–820.
Beery, K.M., Buktenica, N.A., & Beery, N.A. (2005). Beery™-
Buktenica Developmental Test of Visual-Motor Integration
(Beery™-VMI) (5th edition). Florida: Psychological Assessment
Resources.
Bekhet, A.K., Zauszniewski, J.A. & Nakhla, W.E. (2008). Happiness:
theoretical and empirical considerations. Nursing Forum, 43(1), pp.
12–24.
Bekwa, N.N. (2016). The development and evaluation of Africanised
items for multicultural cognitive assessment. Unpublished doctoral
thesis, University of South Africa.
Bekwa, N.N., & De Beer, M. (2012). Multicultural cognitive
assessment: Exploring the use of African artefacts as inspiration
for the development of new items. Poster presented at the 21st
International Association for Cross-cultural Psychology, 17–21
July, 2012, Stellenbosch, South Africa.
Berry, J.W. 2000. Cross-cultural psychology: A symbiosis of cultural
and comparative approaches. Asian Journal of Social Psychology,
3, pp. 197–205. doi: 10.1111%2F1467-839X.00064
Berry, J. W., Poortinga, Y. H., Segall, M. H., & Dasen, P.R. 2002.
Cross-cultural psychology: Research and applications 2nd ed.
Cambridge, MA: Cambridge University Press.
Berry, C.M., Sackett, P.R. & Wiemann, S. (2007). A review of recent
developments in integrity test research. Personnel Psychology,
60(2), pp. 271−301.
Bertelsmann, E. (2010, May 19). Association of Test Publishers of
South Africa and Saville and Holdsworth, case number 12942/09,
Pretoria, Gauteng, Republic of South Africa: North Gauteng High
Court.
Bertua, C., Anderson, N., & Salgado, J.F. (2005). The predictive
validity of cognitive ability tests: A UK meta analysis. Journal of
Occupational and Organizational Psychology, 78, pp. 387–409.
Biddle Consulting Group, Inc. (2005). Manual for Test Analysis and
Validation Program (TVAP) v4.1. Folsom, CA: Author.
Biesheuvel, S. (1943). African intelligence. Johannesburg: South
African Institute of Race Relations.
Biesheuvel, S. (1949). Psychological tests and their application to
Non-European peoples. In The year book of education, pp. 87–
126. London: Evans.
Biesheuvel, S. (1952). Personnel selection tests for Africans. South
African Journal of Science, 49, pp. 3–12.
Birkenstock, C. (2017). Gender differences in work-family conflict as a
predictor of well-being in the transport sector. Unpublished
master’s thesis. University of Johannesburg, Johannesburg.
Blanchflower, D.G. & Oswald, A.J. (2002). Well-being over time in
Britain and the USA. Journal of Public Economics, 88, pp. 1359–
1386.
Bloom, S. (2008, August). Employee wellness programs, Professional
Safety, pp. 41–42.
Bond, T. G., & Fox, C.M. (2007). Applying the Rasch Model:
Fundamental measurement in the human sciences. Mahwah:
Lawrence Erlbaum Associates.
Bosman, J., Rothmann, S., & Buitendach, J.H. (2005). Job insecurity,
burnout and work engagement: The impact of positive and
negative affectivity. SA Journal of Industrial Psychology, 31(4), pp.
48–56.
Bostwick, J. (1994). Neuropsychiatry of depression. In J. Ellison, C.
Weinstein, & T. Hodel- Malinofsky (Eds.), The psychotherapist’s
guide to neuropsychiatry, pp. 409–432. Washington, DC: American
Psychiatric Press.
Botha, C. & Pienaar, J. (2006). South African correctional occupational
stress: The role of psychological strengths. Journal of Criminal
Justice, 34, pp. 73–84.
Bracken, B.A. & Barona, A. (1991). State of the art procedures for
translating, validating and using psychoeducational tests in cross-
cultural assessment. School Psychology International, 12, pp.
119–132.
Brescia, W. & Fortune, J.C. (1989). Standardized testing of American
Indian students. College Student Journal, 23, pp. 98–104.
Brislin, R.W. (1986). The wording and translation of research
instruments. In W.J. Lonner & J.W. Berry (Eds.), Field methods in
cross-cultural psychology, pp. 137–164. Newbury Park, CA: Sage
Publishers.
Brody, N. (2003a). Construct validation of the Sternberg Triarchic
Abilities Test: comment and reanalysis. Intelligence, 31, pp. 319–
329.
Brody, N. (2003b). What Sternberg should have concluded.
Intelligence, 31, pp. 339–342.
Brooks-Gunn, J. (1990). Identifying the vulnerable young children. In
D.E. Rogers and E. Ginsberg (Eds.), Improving the life chances of
children at risk, pp. 104–124. Boulder, CO: Westview Press.
Brown, A., & Bartram, D. (2018). The Occupational Personality
Questionnaire revolution: Applying Item Response Theory to
questionnaire design and scoring. Corporate Executive Board.
Retrieved on 13 September 2018 from
https://www.cebglobal.com/content/dam/cebglobal/us/EN/regions/uk
personality-quesionnaire-white-paper-uk.pdf
Budoff, M. (1987). Measures for assessing learning potential. In C.S.
Lidz (Ed.). Dynamic assessment: An interactional approach to
evaluating learning potential, pp. 173–195. New York: Guilford
Press.
Bunderson, C.B., Inouye, D.K., & Olsen, J.B. (1989). The four
generations of computerized educational measurement. In R.L.
Linn (Ed.). Educational measurement (3rd edition). New York:
Macmillan.
Burke, A. (2008). Could anxiety, hopelessness and health locus of
control contribute to the outcome of a kidney transplant? SA
Journal of Psychology, 38(3), pp. 527–540.
Burke, E. (2006). Better practice for unsupervised online assessment.
SHL White Paper. London: SHL Group. (www.shl.com)
Busbin, J.W. & Campbell, D.P. (1990). Employee wellness programs:
A strategy for increasing participation. Journal of Health Care
Marketing, 10(4), pp. 22–30.
Busichio, K., Tiersky, L., DeLuca, J., & Natelson, B. (2004).
Neuropsychological deficits in patients with chronic fatigue
syndrome. Journal of the International Neuropsychological Society,
10(2), pp. 278–285.
Busseri, M.A., Sadavan, S.W. & DeCourville, N. (2007). A hybrid
model for research on subjective well-being: Examining common –
and component-specific sources of variance in life satisfaction,
positive affect and negative affect. Social Indicators Research,
83(3), pp. 413–445.
Butcher, J.N., Hass, G.A. & Paulsen, J.A. (2016). Clinical
assessments in international settings. In F.T. Leong, D. Bartram,
F.M. Cheung, K.F. Geisinger, & D Iliescu. (Eds.), The ITC
international handbook of testing and assessment (pp. 217−230).
New York, NY: Oxford University Press.
Byrne, B.M., Oakland, T., Leong, F.T., Van de Vijver, F.J.R.,
Hambleton, R.K., Cheung, F.M., & Bartram, D. (2009). A critical
analysis of cross-cultural research and testing practices:
Implications for improved education and training in psychology.
Training and Education in Professional Psychology, 3(2), pp. 94–
105. doi: 10.1037/a0014516
Caffrey, E., Fuchs, D., & Fuchs, L.S. (2008). The predictive validity of
dynamic assessment. The Journal of Special Education, 41(4), pp.
254–270.
Campbell, D.P. & Fiske, D.W. (1959). Convergent and discriminant
validation by the multitrait-multimethod matrix. Psychological
Bulletin, 56, pp. 81–105.
Campion, M. C., Campion, M. A., Campion, E. D., & Reider, M.H.
(2016). Initial investigation into computer scoring of candidate
essays for personnel selection. Journal of Applied Psychology,
101(7), pp. 958−975.
Campione, J.C. & Brown, A.L. (1987). Linking dynamic assessment
with school achievement. In C.S. Lidz (Ed.). Dynamic assessment:
An interactional approach to evaluating learning potential, pp. 82–
115. New York: Guilford Press.
Carroll, J.B. (1997). Psychometrics, intelligence and public perception.
Intelligence, 24(1), pp. 25–52.
Carvalho, A. 2005. Meaning of work and life role salience in a South
African context: A cross-cultural comparison. Unpublished Masters
dissertation, University of Johannesburg, Johannesburg.
Cascio, W.F. (2006). Safety, health and employee assistance
programs. (7th Ed.). In Managing Human Resources: Productivity,
quality of work life, profits (pp.584–623). Boston: McGraw-Hill.
Cascio, W.F. & Aguinis, H. (2005). Applied Psychology in Human
Resource Management. Upper Saddle River, New Jersey:
Pearson.
Casillas, A. & Robbins, S.B. (2005). Test adaptation and cross-cultural
assessment from a business perspective: Issues and
recommendations. International Journal of Testing, 5(1), pp. 5–21.
Cattell, R.B. & Cattell, M.D. (1969). The High School Personality
Questionnaire. Champaign, IL: Institute for Personality and Ability
Testing.
Cattell, R.B., Cattell, A.K., & Cattell, H.E. (1993). The Sixteen
Personality Factor Questionnaire, Fifth Edition. Champaign, IL:
Institute for Personality and Ability Testing, Inc.
Cattell, R.B., Eber, H.W., & Tatsuoka, M.M. (1970). Handbook for the
Sixteen Personality Factor Questionnaire. Champaign, IL: Institute
for Personality and Ability Testing.
Cattell, R.B., Saunders, D.R. & Stice, G.F. (1950). The 16 personality
factor questionnaire. Champaign, Il: Institute for Personality and
Ability Testing.
Cattell, R.B. & Scheier, I.H. (1961). The meaning and measurement of
neuroticism and anxiety. New York: Ronald Press.
Cattell, R.B., Scheier, I.H., & Madge, E.M. (1968). Manual for the
IPAT Anxiety Scale. Pretoria: HSRC.
CEB. (2006). OPQ32 technical manual. Surrey, UK.
CEB. (2013−2016). Assessment catalogue. Pretoria: CEB Inc.
Chambers, L.W. (1993). The McMasters Health Index Questionnaire:
An Update. In S.R. Walker & R.M. Rosen (Eds), Quality of life
assessment: Key issues of the 1990s. Lancaster, England: Kluwer
Academic Publishers.
Charlesworth, W.R. (1992). Darwin and Developmental Psychology:
Past and Present. Developmental Psychology, 28(1), pp. 5–6.
Cheung, F. M., & Cheung, S.F. (2003). Measuring personality and
values across cultures: Imported versus indigenous measures. In
W.J. Lonner, D.L. Dinnel, S.A. Hayes, & D.N. Sattler (Eds.), Online
Readings in Psychology and Culture (Unit 6, Chapter 5),
(http://www.wwu.edu/~culture), Center for Cross- Cultural
Research, Western Washington University, Bellingham,
Washington USA.
Cheung, F.M. & Fetvadjiev, V.H. (2016). Indigenous approaches to
testing and assessment. In F.T. Leong, D. Bartram, F.M. Cheung,
K.F. Geisinger, & D. Iliescu. (Eds.), The ITC international
handbook of testing and assessment (pp.333−346). New York, NY:
Oxford University Press.
Cheung, F. M., Leung, K., Fan, R., Song, W. Z., Zhang, J. X., &
Zhang, J.P. (1996). Development of the Chinese Personality
Assessment Inventory (CPAI). Journal of Cross-Cultural
Psychology, 27, pp. 181–199.
Cheung, G. W., & Rensvold, R.B. (1998). Testing measurement
models for factorial invariance: A systematic approach.
Educational and Psychological Measurement, 58, pp. 1017–1034.
Cheung, G. W., & Rensvold, R.B. (1999). Testing factorial invariance
across groups: A reconceptualization and proposed new method.
Journal of Management, 25(1), pp. 1–27.
Cheung, G. W., & Rensvold, R.B. (2000). Assessing extreme and
acquiescence response sets in cross-cultural research using
structural equations modeling. Journal of Cross-Cultural
Psychology, 31(2), pp. 187–212.doi:
10.1177/0022022100031002003.
Cheung, G. W., & Rensvold, R.B. (2002). Evaluating goodness-of-fit
indexes for testing measurement invariance. Structural Equation
Modeling: A Multidisciplinary Journal, 9(2), pp. 233–255. doi:
10.1207/s15328007sem0902_5.
Cheung, F. M., Leung, K., Fan, R., Song, W. Z., Zhang, J. X., &
Zhang, J.P. (1996). Development of the Chinese Personality
Assessment Inventory (CPAI). Journal of Cross-Cultural
Psychology, 27, pp. 181–199.
Cheung, F.M., Van de Vijver, F.J.R., & Leung, F.T.L. (2011). Toward a
new approach to the study of personality in culture. American
Psychologist, 66, pp. 593−603.
Claassen, N.C.W. (1990). The comparability of General Scholastic
Aptitude Test scores across different population groups. SA
Journal of Psychology, 20, pp. 80–92.
Claassen, N.C.W. (1995). Cross-cultural assessment in the human
sciences. Paper presented at a work session on the meaningful
use of psychological and educational tests, HSRC, 24 October
1995.
Claassen, N.C.W. (1996). Paper and Pencil Games (PPG): Manual.
Pretoria: HSRC.
Claassen, N.C.W. (1997). Cultural differences, politics and test bias in
South Africa. European Review of Applied Psychology, 47(4), pp.
297–307.
Claassen, N.C.W., De Beer, M., Ferndale, U., Kotzé, M., Van Niekerk,
H.A., Viljoen, M., & Vosloo, H.N. (1991). General Scholastic
Aptitude Test (GSAT) Senior Series. Pretoria: HSRC.
Claassen, N.C.W., De Beer, M., Hugo, H.L.E., & Meyer, H.M. (1991).
Manual for the General Scholastic Aptitude Test (GSAT). Pretoria:
HSRC.
Claassen, N.C.W. & Hugo, H.L.E. (1993). The relevance of the
general Scholastic Aptitude Test (GSAT) for pupils who do not
have English as their mother tongue (Report ED-21). Pretoria:
HSRC.
Claassen, N.C.W., Krynauw, A., Paterson, H., & wa ga Mathe, M.
(2001). A standardisation of the WAIS-III for English-speaking
South Africans. Pretoria: HSRC.
Claassen, N.C.W. & Schepers, J.M. (1990). Groepverskille in
akademiese intelligensie verklaar op grond van verskille in sosio-
ekonomiese status (Group differences in academic intelligence
that can be explained by differences in socio-economic status). SA
Journal of Psychology, 20, pp. 294–302.
Claassen, L., Schepers, J.M. & Roodt, G. (2004). Werkwaardes van
akademici [Work values of academics]. SA Journal of Industrial
Psychology, 30(4), pp. 82–92.
Claassen, N.C.W., Van Heerden, J.S., Vosloo, H.N., & Wheeler, J.J.
(2000). Manual for the Differential Aptitude Tests Form R (DAT-R).
Pretoria: HSRC.
Clarke, M.M., Maduas, G.F., Horne, C.L., & Ramos, M.A. (2000).
Retrospective on educational testing and assessment in the 20th
century. Journal of Curriculum Studies, 32(2), pp. 159–181.
Cleary, T.A. (1968). Test bias: prediction of grades of Negro and white
students in integrated colleges. Journal of Educational
Measurement, 5, pp. 115–124.
Cloete, P.A. (2015). The impact of an employee wellbeing programme
on return on investment in terms of absenteeism and employee
psychosocial functioning. Unpublished master’s dissertation,
University of Pretoria, South Africa.
Cockcroft, K., Alloway, T., Copello, E., & Milligan, R. (2015). A cross-
cultural comparison between South African and British students on
the Wechsler Adult Intelligence Scales Third Edition (WAIS-III).
Front. Psychol. doi: 10.3389/fpsyg.2015.00297
Cockcroft, K., Amod, Z., & Soellaart, B. (2008). Maternal level of
education and performance on the 1996 Griffiths Mental
Development Scales. African Journal of Psychiatry, 11, pp. 44−50.
Cockcroft, K. & Blackburn, M. (2008). The relationship between Senior
South African Individual Scale – Revised (SSAIS-R) subtests and
reading ability. SA Journal of Psychology, 38(2), pp. 377–389.
Cockroft, K., Bloch, L. & Moolla, A. (2016). Assessing verbal
functioning in South African school beginners from diverse
socioeconomic backgrounds: A comparison between verbal
working memory measures and vocabulary measures. Education
as Change, 20(1), pp. 199−215. doi:
http://dx.doi.org/10.17159/1947-9417/2016/559
Cockcroft, K. & Israel, N. (2011). The Raven’s Advanced Progressive
Matrices: A comparison of relationships with verbal ability tests. SA
Journal of Psychology, 41(3), pp. 363–372.
Coetzee, C.J.H., Fourie, L., & Roodt, G. (2002). The development and
validation of the communication-for-change questionnaire. SA
Journal of Industrial Psychology, 28(3), pp. 16–25.
Coetzee, N. & Vosloo, H.N. (2000). Manual for the Differential
Aptitude Tests Form K (DAT-K). Pretoria: HSRC.
Coetzer, W. J., & Rothmann, S. (2004, August). A model of work
wellness of employees in an insurance company. Poster presented
at 28th International Congress of Psychology, Beijing, China.
Cohen, R.J., & Swerdlik, M.E. (2010). Psychological testing and
assessment: An introduction to tests and measurement (7th ed.).
Boston: McGraw-Hill.
Colby, K. M., Watt, J.B. & Gilbert, J.P. (1966). A Computer method of
psychotherapy. The Journal of Nervous and Mental Disease,
142(2), pp. 148−152.
Conn, S.R., & Rieke, M.L. (1998). The 16PF Fifth Edition Technical
Manual (2nd ed.). Champaign, IL: Institute for Personality and
Ability Testing, Inc.
Coons, C.E., Frankenburg, W.K., Gay, E. C., Fandal, A.W., Lefly, D.L.
& Ker, C. (1982). Preliminary results of a combined
developmental/environmental screening project. In N.J.
Anastasiow, W.K. Frankenburg & A. Fandal (Eds.), Identifying the
developmentally delayed child, pp. 101–110. Baltimore: University
Park Press.
Cooper, C.L. (2011). Organizational health and wellbeing. Los
Angeles: Sage.
Cormier, D.C., Bulut, O., McGrew, K.S. & Singh, D. (2017). Exploring
the relations between Catell-Horn-Carroll (CHC) cognitive abilities
and mathematics achievement. Applied Cognitive Psychology, 31,
pp. 530−538.
Corrigall, J., Ward, C., Stinson, K., Struthers, P., Frantz, J., Lund, C.,
Flisher, A.J., & Joska, J. (2007). Decreasing the burden of mental
illness. Western Cape burden of disease reduction project, 4, pp.
1–19.
Cotton, G, (2004). Health in the workplace. Occupational Health,
56(12), pp. 2–6.
Coyne, I. & Bartram, D. (2002). Assessing the effectiveness of
integrity tests: A review. International Journal of Testing, 2(1), pp.
15–34.
Crites, J.O. (1981). Career counselling: Models, methods and
materials. New York: McGraw Hill.
Cronbach, L.J. (1951). Coefficient Alpha and the internal structure of
tests. Psychometrika, 16 (3): pp. 297–334.
doi:10.1007/bf02310555.
Cronbach, L. (1970). Psychological testing (3rd ed.). New York:
Harper & Row.
Cronbach, L.J. (1990). Essentials of Psychological Testing (5th ed.).
New York: Harper & Row.
Cronbach, L.J. (2004). My current thoughts on Coefficient Alpha and
successor procedures. Educational and Psychological
Measurement, 64, pp. 391–418.
Cronbach, L. J., & Meehl, P.E. (1955). Construct validity in
psychological tests. Psychological Bulletin, 52(4), pp. 281–302.
Csapó, B., Molnár, G. & Nagy, J. (2014). Computer-based
assessment of school readiness and early reasoning. Journal of
Educational Psychology, 106(3), pp. 639−650.
Csikszentmihalyi, M. & Csikszentmihalyi, I.S. (2006). A life worth
living: Contributions to positive psychology. New York: Oxford
University Press.
Cui, Y. & Guo, Q. (2016, July). Statistical classification for cognitive
diagnostic assessment: An artificial neural network approach.
Educational Psychology, 36(6), pp. 1065−1082.
Dannhauser, Z. & Roodt, G. (2001). Value disciplines: Measuring
customer preferences. Journal of Industrial Psychology, 27(1), pp.
8–16.
Dai, B., Zhang, M. & Li, G. (2016). Exploration of item selection in
dual-purpose cognitive diagnostic computerized adaptive testing:
Based on the RRUM. Applied Psychological Measurement, 40(8),
pp. 625−640.
Das, J.P., Naglieri, J.A., & Kirby, J.R. (1994). Assessment of cognitive
process: The PASS theory of intelligence. Boston: Allyn and
Bacon.
Davey, D.M. (1989). How to be a good judge of character. London:
Kogan Page.
Davidson, C.A., Johannesen, J.K. & Fiszdon, J.M. (2016). Role of
learning potential in cognitive remediation: Construct and
predictive validity. Schizophrenia Research, 171, pp. 117−124.
Dawis, R.V. (2002). Person-environment-correspondence theory. In
D. Brown & Associates (Eds.), Career Choice and Development
(4th ed.; pp. 427–464). San Francisco, CA: Jossey-Bass.
Dawis, R.V., & Lofquist, L.H. (1984). A psychological theory of work
adjustment. Minneapolis, MN: University of Minnesota Press.
De Beer, M. (2000a). Learning potential computerized adaptive test
(LPCAT): User’s manual. Pretoria: Unisa Press
De Beer, M. (2000b). The construction and evaluation of a dynamic,
computerised adaptive test for the measurement of learning
potential. Unpublished D. Litt et Phil dissertation, University of
South Africa.
De Beer, M. (2004). Use of differential item functioning (DIF) analysis
for bias analysis in test construction. SA Journal of Industrial
Psychology, 30(4), pp. 52–58.
De Beer, M. (2005). Development of the Learning Potential
Computerized Adaptive Test (LPCAT). SA Journal of Psychology,
35(4), pp. 717–747.
De Beer, M. (2006). Dynamic testing: Practical solutions to some
concerns. SA Journal of Industrial Psychology, 32(4), pp. 8–14.
De Beer, M. (2010). A modern psychometric approach to dynamic
assessment. Journal of Psychology in Africa, 20(2), pp. 241–246.
De Beer, M. (2013). The Learning Potential Computerised Adaptive
Test in South Africa. In S. Laher & K. Cockcroft (Eds.),
Psychological assessment in South Africa: Research and
applications (pp. 137-157). Johannesburg: Wits University Press.
De Beer, M., Botha, E.J., Bekwa, N.N. & Wolfaardt, J.B. (1999).
Industrial psychological testing and assessment (Study guide for
IOP301-T). Pretoria: University of South Africa.
De Bod, A. (2011). Contemporary Ethical Challenges in Industrial
Psychology Testing in South Africa. Unpublished M Phil
dissertation, University of Johannesburg, South Africa.
De Bruin, G.P. (1996). An item factor analysis of the Myers-Briggs
Type Indicator (Form G Self-scorable). Paper presented at the
International Type User’s Conference. Sandton, Johannesburg.
De Bruin, G. (2000). An inter-battery factor analysis of the Comrey
personality scales and the 16 Personality Factor questionnaire. SA
Journal of Industrial Psychology, 26(3), pp. 4–7.
doi:10.4102/sajip.v26i3.712
De Bruin, G.P. (2002). The relationship between personality traits and
vocational interests. Journal of Industrial Psychology, 28(1), pp.
49–52.
De Bruin, G.P., & Bernard-Phera, M.J. (2002). Confirmatory factor
analysis of the Career Development Questionnaire and the Career
Decision-making Self-Efficacy Scale for South African high school
students. SA Journal of Industrial Psychology, 28(2), pp. 1–6.
De Bruin, G.P. & De Bruin, K. (2017). Career assessment. In G.B.
Stead & M.B. Watson (Eds.), Career psychology in the South
African context (pp. 185−193). Pretoria, South Africa: Van Schaik.
De Bruin, K., De Bruin, G.P., Dercksen, S., & Hartslief-Cilliers, M.
(2005). Can measures of personality and intelligence predict
performance in an adult basic education training programme? SA
Journal of Psychology, 35, pp. 46–57.
De Bruin, G.P. & Nel, Z.J. (1996). ’n Geïntegreerde oorsig van
empiriese navorsing oor loopbaanvoorligting in Suid-Afrika: 1980–
1990 (An integrated overview of empirical research concerning
career counselling in South Africa: 1980–1990). SA Journal of
Psychology, 26, pp. 248–251.
De Bruin, G. P., & Rudnick, H. (2007). Examining the cheats: The role
of conscientiousness and excitement seeking in academic
dishonesty. SA Journal of Psychology, 37(1), pp. 153–164.
De Bruin, G. P., Schepers, J. M., & Taylor, N. (2005). Construct
validity of the 16PF 5th Edition and the Basic Traits Inventory.
Paper presented at the 8th Annual Industrial Psychology
Conference, Pretoria.
De Bruin, G.P. & Taylor, N. (2005). Development of the Sources of
Work Stress Inventory. SA Journal of Psychology, 35(4), pp. 748–
765.
De Bruin, G. P., & Taylor, N. (2008, July). Block versus mixed-order
item presentation in personality inventories: An IRT analysis.
Paper presented at the XXIX International Congress of
Psychology, Berlin, Germany.
Deci, E.L. & Ryan, R.M. (2008). Hedonia, eudemonia, and well-being:
An introduction. Journal of Happiness Studies, 9, pp. 1–11.
DeLamatre, J. E., & Shuerger, J.M. (2000). The MMPI-2 in counseling
practice. In C.E.J. Watkins (Ed.), Testing and assessment in
counseling practice (2nd ed., pp. 15–44). Mahwah, NJ: Erlbaum.
Delgado, M.T., Uribe, P.A., Alonso, A.A. & Diaz, R.R. (2016). TENI: A
comprehensive battery for cognitive assessment based on games
and technology. Child Neuropsychology, 22(3), pp. 276−291.
Demerouti, E., & Bakker, A.B. (2011). The job demands-resources
model: Challenges for future research. Journal of Industrial
Psychology, 37(2), pp. 1–9. doi: 10.4102/sajip.v37i2.974.
Demerouti, E., Bakker, A.B., Nachreiner, F., & Schaufeli, W.B. (2001).
The Job demands-resources model of burnout. Journal of Applied
Psychology, 86(3), pp. 499–512.
Department of Education (2001). Education White Paper 6. Special
needs education: Building an inclusive education and training
system. Pretoria: Department of Education.
Department of Education (2005). Conceptual and operational
guidelines for the implementation of inclusive education: District-
based support teams. Pretoria: Department of Education.
Department of Health (2006). Ethical Rules of Conduct for
Professionals Registered under the Health Professions Act, 1974.
Government Gazette, No. 29079, 4 August 2006.
Department of Health (2008). Regulations Defining the Scope of the
Profession of Psychology. Government Gazette, No. 31433, 16
September 2008. Retrieved 30 November 2012 from
http://www.hpcsa.co.za/downloads/psychology/scope_of_profession
Department of Health (2011). Regulations Defining the Scope of the
Profession of Psychology. Government Gazette, No. 34581, 2
September 2011. Retrieved 30 November 2012 from
http://www.hpcsa.co.za/downloads/psychology/promulgated_scope
Department of Justice (2000) Promotion of Access to Information Act
No. 2 of 2000. Republic of South Africa. Available at
http://www.justice.gov.za/legislation/acts/2000-002.pdf Department
of Justice (2013). Protection of Personal Information Act
No. 4 of 2013. Republic of South Africa. Available at
www.justice.gov.za/legislation/acts/2013-004.pdf
Department of Labour of the Republic of South Africa (2014, January
16). Employment Equity Amendment Act. Government Gazette,
583(37238).
De Ridder, J.C. (1961). The personality of the urban African in South
Africa. New York: Humanities Press.
Desai, F. (2010). The relationship between personality traits and team
culture. Unpublished master’s dissertation. Pretoria, South Africa:
University of South Africa.
De Sousa, D.S., Greenop, K., & Fry, J. (2010). Simultaneous and
sequential cognitive processing in monolingual and bilingual
children in South Africa. SA Journal of Psychology, 40(2), pp. 165–
173.
Dickman, B.J. (1994). An evaluation of the Grover-Counter Test for
use in the assessment of black South African township children
with mental handicaps. Unpublished Doctoral dissertation,
University of Cape Town.
Diener, Ed. (2009). Assessing well-being: The collected works of Ed
Diener. New York: Springer.
Diener, E. & Emmons, R.A. (1984). The independence of positive and
negative effect. Journal of Personality and Social Psychology,
47(5), pp. 1105–1117.
Diener, E., Emmons, R. A., Larsen, R. J., & Griffin, S. (1985). The
Satisfaction With Life Scale. Journal of Personality Assessment,
49, pp. 71–75.
Diener, E., Kesebir, P. & Lucas, R. (2008). Benefits of accounts of
well-being – for societies and for psychological science. Applied
psychology: An international review, 57, pp. 37–53.
Diener, E., Sapyta, J.J. & Suh, E. (1998). Subjective well-being is
essential to well-being. Psychological Inquiry, 9(1), pp. 33–38.
Diener, E., Scollon, C.N. & Lucas, R.E. (2009). The evolving concept
of subjective well-being: The multifaceted nature of happiness. In
E. Diener (ed.), Social indicators research series: Vol. 39.
Assessing well-being: The collected works of Ed Diener (pp.
67−100). New York, NY, US: Springer Science + Business Media.
Dierendonck, D., Diaz, D., Rodriguez-Carvajal, R., Blanco, A. &
Moreno-Jiménez, B. (2008). Ryff’s six-factor model of
psychological well-being. Social Indicators Research, 87(3), pp.
473–479.
Di Fabio, A. & Maree, J.G. (2013). Career counselling: The usefulness
of the Career Interest Profile. Journal of Psychology in Africa,
23(1), pp. 41−49. doi: 10.1080/14330237.2013.10820592
Domino, G. & Domino, M.L. (2006). Psychological Testing: An
introduction. 2nd ed. Cambridge, New York: Cambridge University
Press.
Donadello, I., Spoto, A., Badaloni, S., Granziol, U. & Vidotto, G.
(2017). ATS-PD: An adaptive testing system for psychological
disorders. Educational and Psychological Measurement, 77(5), pp.
792−815.
Dowdeswell, K.& Joubert, T. (2014). Cheating in online unsupervised
assessment & verification trends across Africa. Paper presented at
the 16th annual conference of the Society for Industrial &
Organisational Psychology of South Africa, Pretoria.
Drozdick, L.W., Getz, K., Raiford, S. E., & Zhang, O. (2016). WPPSI-
IV: Equivalence of Q-interactive and paper formats. Q-interactive
Technical Report 14. Bloomington, MN: Pearson. Retrieved 3 May
2018 from
http://www.helloq.com/content/dam/ped/ani/us/helloq/media/WPPSI
Qi-Tech-Report-14-FNL.pdf
Dunn, L.M. & Dunn, L.M. (1981). Peabody Picture Vocabulary Test –
Revised. Circle Pines, MN: American Guidance Service.
Du Toit, H.A. (1986). ’n Voorlopige standaardisering van die SAIS
(Nie-verbaal) vir gehoorgestremdes (A preliminary standardisation
of the SSAIS (Non-verbal) for the hearing disabled). Unpublished
master’s disertation, University of Stellenbosch.
Du Toit, R. & De Bruin, G.P. (2002). The structural validity of Holland’s
R-I-A-S-E-C model of vocational personality types for young black
South African men and women. Journal of Career Assessment, 10,
pp. 62–77.
Du Toit, R. & Gevers, J. (1990). Die Self- ondersoekvraelys ten
opsigte van beroepsbelangstelling (SOV): Handleiding (The self-
examination questionnaire regarding career interest: Handbook).
Pretoria: Raad vir Geesteswetenskaplike Navorsing (HSRC).
Du Toit, W. & Roodt, G. (2003). The discriminant validity of the Culture
Assessment Instrument: A comparison of company cultures. SA
Journal of Human Resource Management, 1(1), pp. 77–84.
Eaton, W. W., Smith, C., Ybarra, M., Muntaner, C., & Tien, A. (2004).
Center for Epidemiologic Studies Depression Scale: Review and
revision (CESD and CESD-R). In M. E Maruish (Ed.), The use of
psychological testing for treatment planning and outcomes
assessment. Volume 3: Instruments for Adults (3rd ed.; pp.
363−377). Mahwah, NJ: Lawrence Erlbaum.
Edwards, S. D., & Edwards, D.J. (2009). Counselling for psychological
well-being in persons living with HIV/Aids. Journal of Psychology in
Africa, 19(4), pp. 561−563.
Ellis, B.B. (1989). Differential item functioning: Implications for test
translation. Journal of Applied Psychology, 74, pp. 912–921.
Ellis, B.B. (1991). Item Response Theory: A tool for assessing the
equivalence of translated tests. Bulletin of the International Test
Commission, 18, pp. 33–51.
Ellis, L.L. (2007). The impact of HIV/Aids on selected business sectors
in South Africa. Studies in Economics and Econometrics, 31(1),
pp. 29−52.
Embretson, S. E., & Reise, S.P. (2000). Item Response Theory for
psychologists. Mahway: Lawrence Erlbaum Associates.
Employment Equity Act, number 55 (1998). Government Gazette, 400
(19370), Cape Town, 19 October.
Encouraging corporate wellness maintenance (29 August 2007).
Retrieved July 15, 2008, from
http://www.bizcommunity.com/Article/196/141/17623.html
England, J. & Zietsman, G. (1995). The HSRC stakeholder survey on
assessment. Internal report prepared for Human Resources:
Assessment and Information Technology, HSRC, Pretoria, South
Africa.
Epic Games Inc. (2017). Unreal engine. Retrieved on 13 September
2018 from: https://unrealengine.com/en-US/blog
Erasmus, P.F. (1975). TAT-Z Catalogue No. 1676. Pretoria: HSRC.
Evers, A. (2012). The internationalization of test reviewing: Trends,
differences, and results. International Journal of Testing, 12, pp.
136–156.
Evers, A., Sijtsma, K., Lucassen, W. & Meijer.R. (2010). The Dutch
review process for evaluating the quality of psychological tests:
History, procedure, and results. International Journal of Testing,
10, pp. 295–317.
Exner, J.E. (2003). The Rorschach: a comprehensive system: Basic
foundations and principles of interpretation. Vol 1 (4th ed.).
Hoboken, NJ: John Wiley & Sons.
Eysenck, H.J. (1988). The concept of ‘intelligence’: Useful or useless.
Intelligence, 12, pp. 1–16.
Eysenck, H.J. & Eysenck, M.W. (1958). Personality and individual
differences. New York: Plenum Publishers.
Eysenck, H.J. & Kamin, L. (1981). The intelligence controversy. New
York: John Wiley and Sons.
Fagbola, T.M., Adigun, A.A. & Oke, A.O. (2013). Computer-based test
(CBT) system for university academic enterprise examination.
International Journal of Scientific and Technology Research, 2(8),
pp. 336−342.
Fancher, R.E. (1985). The intelligence men: Makers of the IQ
controversy. New York: W.W. Norton & Company.
Feldman, F. (2010). What is this thing called happiness? New York:
Oxford University Press.
Ferrett, H. C., Carey, P. D., Thomas, K.G. F., Tapert, S. F., & Fein, G.
(2010). Neuropsychological performance of South African
treatment-naïve adolescents with alcohol dependence. Drug and
Alcohol Dependence, 110, pp. 8–14.
Fetvadjiev, V.H., Meiring, D., Van de Vijver, F.J., Nel, J.A. & Hill, C.
(2015). The South African personality inventory (SAPI): A culture-
informed instrument for the country’s main ethnocultural groups.
Psychological Assessment, 27(3), pp. 827−837.
Feuerstein, R., Rand, Y., Jensen, M.R., Kaniel, S., & Tzuriel, D.
(1987). Prerequisites for assessment of learning potential: The
LPAD model. In C.S. Lidz (Ed.), Dynamic assessment: An
interactive approach to evaluating learning potential, (pp. 35–51).
New York: Guilford Press.
Fick, M.L. (1929). Intelligence test results of poor white, native (Zulu),
coloured and Indian school children and the educational and social
implications. South African Journal of Science, 26, pp. 904–920.
Finkelman, M. D., Lowe, S. R., Kim, W., Gruebner, O., Smits, N., &
Galea, S. (2017). Customized computer-based administration of
the PCL-5 for the efficient assessment of PTSD: A proof-of-
principle study. Psychological Trauma: Theory, Research,
Practice, and Policy, 9(3), pp. 379−389.
Fischer, R., & Fontaine, J.R.J. (2011). Methods for investigating
structural equivalence. In D. Matsumoto & F.J.R. Van de Vijver,
Cross-cultural Research Methods in Psychology (pp. 179–215).
Cambridge, NY: Cambridge University Press.
Fisher, K.N. & Broom, Y.M. (2006). Computerised neuropsychological
testing in South Africa. Paper presented at the 10th national
conference of the South African Clinical Neuropsychological
Association, Cape Town, South Africa.
Folstein, M.F., Folstein, S.E. and McHugh, P.R. (1975). Mini-mental
state. Journal of Psychiatric Research, 12, pp. 189–195.
Fontaine, J. (2005). Equivalence. In Encyclopedia of social
measurement (Vol. 1, pp. 803–813). New York, NY: Elsevier.
Foster, D.F. (2010). World-wide testing and test security issues:
Ethical challenges and solutions. Ethics & Behavior, 20(3–4), pp.
207–228.
Foster, W.H. & Vaughan, R.D. (2005). Absenteeism and business
costs: (does), substance abuse matter? Journal of Substance
Abuse Treatment, 28(1), pp. 27.
Foulke, J. & Sherman, B. (2005). Comprehensive workforce health
management – not a cost, but a strategic advantage. Employment
Relations Today, 32(2), pp. 17–29.
Fourie, L., Rothmann, S., & Van de Vijver, F.J.R. (2008). A model of
work-related well-being for non-professional counsellors in South
Africa. Stress & Health, 24, pp. 35–47.
Foxcroft, C.D. (1997a). Updated policy document on psychological
assessment: The challenges facing us in the new South Africa.
Unpublished manuscript, University of Port Elizabeth.
Foxcroft, C.D. (1997b). Psychological testing in South Africa:
Perspectives regarding ethical and fair practices. European
Journal of Psychological Assessment, 13(3), pp. 229–235.
Foxcroft, C.D. (2004a). Evaluating the school-leaving examination
against measurement principles and methods: From the
Matriculation examination to the FETC. Paper presented at the
Matric Colloquium of the HSRC, Pretoria, November 2004.
Foxcroft, C.D. (2004b). Computer-based testing: Assessment
advances and good practice guidelines for practitioners. Workshop
presented at the 10th national congress of the
Psychological Society of South Africa, Durban, September 2005.
Foxcroft, C.D. (2004c). Planning a Psychological Test in the
Multicultural South African Context. SA Journal of Industrial
Psychology, 30(4), pp. 8–15. Available at:
https://www.google.co.za/url?
sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwiP4dqi_u
-SUN2Jj
Foxcroft, C.D. (2005). Benchmark tests: What, why and how. Pretoria:
SAUVCA.
Foxcroft, C.D. (2008). NMMU access testing: Philosophy, principles,
and process. Paper presented at a symposium on admissions
testing at the University of Johannesburg, April 2008.
Foxcroft, C.D. (2011a). Ethical issues related to psychological testing
in Africa: What I have learned (so far). In W.J. Lonner, D.L. Dinnel,
S.A. Hayes, & D.N. Sattler (Eds.), Online Readings in Psychology
and Culture (Unit 5, Chapter 4), Center for Cross-Cultural
Research, Western Washington University, Bellingham,
Washington USA. Retrieved from
http://scholarworks.gvsu.edu/orpc/vol2/iss2/7.
Foxcroft, C.D. (2011b). Some issues in assessment in a developing
world context: An African perspective. Invited stimulus paper for an
internet-based symposium/forum on Issues in Assessment and
Evaluation as part of the ASEASEA International conference held
between 11 and 18 July 2011 and hosted by the Wits Programme
Evaluation Group.
http://wpeg.wits.ac.za/components/com_agora/img/members/3/Fox
paperandintro.pdf
Foxcroft, C.D. (2017). Ethical issues in conducting child development
research in Sub-Saharan Africa. In A. Abubakar & F. Van de Vijver
(Eds.), Handbook of applied developmental science in Sub-
Saharan Africa. New York: Springer-Verlag.
Foxcroft, C.D. & Aston, S. (2006). Critically examining language bias
in the South African adaptation of the WAIS-III. SA Journal of
Industrial Psychology, 32(4), pp. 97–102.
Foxcroft, C.D. & Davies, C. (2008). Historical perspectives on
psychometric testing in South Africa. In van Ommen, C. and
Painter, D. (Eds.), Interiors: A history of psychology in South Africa
(pp. 152–181). Pretoria: University of South Africa Press.
Foxcroft, C.D. & De Bruin, G.P. (2008). Facing the challenge of the
language of assessment and test adaptation. Paper presented at
the XXIX International Congress of Psychology, Berlin, Germany,
July 2008.
Foxcroft, C.D., Paterson, H., Le Roux, N., & Herbst, D. (2004).
Psychological assessment in South Africa: A needs analysis.
Pretoria: HSRC.
Foxcroft, C.D. & Shillington, S.J. (1996). Sharing psychology with
teachers: A South African perspective. Paper presented at the
26th International Congress of Psychology, Montreal, Canada, 16–
21 August 1996.
Foxcroft, C., Watson, M.B., de Jager, A., & Dobson, Y. (1994).
Construct validity of the Life Role Inventory for a sample of
university students. Paper presented at the National Conference of
the Student Counsellors Society of South Africa. Cape Town.
Foxcroft, C.D., Watson, A.S.R., and Seymour, B.B. (2004). Personal
and situational factors impacting on CBT practices in developing
countries. Paper presented at the 28th International Congress of
Psychology, Beijing, China, 8–13 August 2004.
Frankenburg, W.K. & Dodds, J.B. (1990). Denver II: Screening
Manual. Denver, Colorado: Denver Developmental Materials.
Frankenburg, W.K., Dodds, J.B., Archer, P., Bresnick, B., Maschka,
P., Edelman, N., & Shapiro, H. (1990). Denver II: Technical
Manual. Denver, Colorado: Denver Developmental Materials.
Fremer, J. (1997). Test-taker rights and responsibilities. Paper
presented at the Michigan School Testing Conference, Ann Arbor,
Michigan, USA, February 1997.
Fritz, E. (2016). Flowing between assessment and therapy through
creative expressive arts. In R. Ferreira (Ed.). Psychological
assessment: Thinking innovatively in contexts of diversity (pp.
103−209). Cape Town: Juta and Company
Furnham, A., Ndlovu, N. & Mkhize, N. (2009). South African Zulus’
beliefs about their own and their children’s intelligence. South
African Journal of Psychology, 39(2), pp. 157−168. doi:
10.1177/008124630903900202.
Gadd, C. & Phipps, W.D. (2012). A preliminary standardisation of the
Wisconsin Card Sorting Test for Setswana-speaking university
students: Cognitive science and neuropsychology. SA Journal of
Psychology, 42(3), pp. 389–398.
Gallagher, A., Bridgeman, B., & Cahalan, C. (2000). The effect of
computer-based tests on racial/ethnic, gender and language
groups. GRE Board Professional Report No. 96–21P. Princeton,
NJ: Educational Testing Service.
Garcia Coll, C. & Magnuson, K. (2000). Cultural differences as
sources of developmental vulnerabilities and resources. In J.
Shonkoff & S. Meisels (Eds), Handbook of Early Childhood
Intervention (2nd edition), pp. 94–114. Cambridge: Cambridge
University Press.
Gardner, H. (1983). Frames of mind: The theory of multiple
intelligence. New York: Basic Books.
Geerlings, H., Glas, C.A.W., & Van der Linden, W.J. (2011). Modeling
rule-based item generation. Psychometrika, 76(2), pp. 337–359.
Geerlings, H., Van der Linden, W.J., & Glas, C.A.W. (2012). Optimal
test design with rule-based item generation. Applied Psychological
Measurement, 37(2), pp. 140−161.
Geisinger, K. & McCormick, C. (2016). Testing individuals with
disabilities. In F.T.L. Leong, D. Bartram. F.M. Cheung, K.F.
Geisinger & Iliescu, D. (Eds.), The ITC international handbook of
testing and assessment (pp. 259−275). New York, NY: Oxford
University Press.
Gelderblom, J.H. (2008). Designing technology for young children:
Guidelines grounded in a literature investigation on child
development and children’s technology. Unpublished doctoral
dissertation, University of South Africa, South Africa.
Gelman, D., Hager, M., Doherty, S., Gosnell, S., Raine, G. & Shapito,
D. (4 May 1987). Depression. Newsweek, pp. 42–48.
Getz, L.M., Chamorro-Premuzic, T., Roy, M.M., & Devroop, K. (2012).
The relationship between affect, uses of music, and music
preferences in a sample of South African adolescents. Psychology
of Music, 40(2), pp. 164–178.
Gibbs, E. (1990). Assessment of infant mental ability. In E.D. Gibbs
and D.M. Teti (Eds), Interdisciplinary assessment of infants: A
guide for early intervention professionals, pp. 7–89. Baltimore, MD:
Paul H Brookes.
Gierl, M.J., Lai, H., Pugh, D., Touchie, C., Boulais, A., &De
Champlain, A. (2016). Evaluating the psychometric characteristics
of generated multiple-choice test items. Applied Measurement in
Education, 29, pp. 196−210.
Gierl, M.J., & Zhou, J. (2008). Computer Adaptive- Attribute Testing: A
new approach to cognitive diagnostic assessment. Zeitschrift für
Psychologie / Journal of Psychology, 216(1), pp. 29–39.
Gierl, M.J., Zhou, J., Alves, C. (2008). Developing a taxonomy of item
model types to promote assessment engineering. Journal of
Technology, Learning, and Assessment,7(2), pp. 4–50. Retrieved
30 November 2012 from http://www.jtla.org.
Gintiliene, G. & Girdzijauskiene, S. (2008) Test development and use
in Lithuania. In J. Bogg (Ed.), Testing International, 20, December
2008, p. 7–8. (Can be accessed at www.intestcom.org).
Goldberg, D.P. (1978). General Health Questionnaire. London: GL
Assessment Limited.
Goldberg, D.P., & Hillier, V.F. (1979). A scaled version of the General
Health Questionnaire. Psychological Medicine, 9(1), pp. 139–145.
Goldberg, L.R., & Johnson, J.A. (2017, February 22). The 3 320 IPIP
items in alphabetical order. Retrieved on 13 September 2018 from
https://ipip.ori.org/AlphabeticalItemList.htm.
Goleman, D. (1995). Emotional intelligence – why it can matter more
than IQ. London: Bloomsbury.
Gordon, L.V. & Alf, E.F. (1960). Acclimatization and aptitude test
performance. Educational and Psychological Measurement, 20,
pp. 333–337.
Gorsuch, R.L. (1983). Factor analysis. Hillsdale, NJ: Lawrence
Erlbaum.
Gottfredson, L.S. (2003). Dissecting practical intelligence theory: Its
claims and evidence. Intelligence, 31, pp. 343–397.
Gould, S.J. (1981). The mismeasure of man. London: The Penguin
Group.
Govender, S.A. (2008). The relationship between personality and
coping amongst members of the South African Police Service.
Unpublished master’s dissertation. Johannesburg, South Africa:
University of South Africa.
Gradidge, D.H. & De Jager, A.C. (2011). Psychometric properties of
the Wellness Questionnaire for Higher Education. SA Journal of
Psychology, 41(4), pp. 517–527.
Graham, J.R. (2006). MMPI-2: Assessing personality and
psychopathology (4th ed.). New York: Oxford University Press.
Gredler, G.R. (1997). Issues in early childhood screening and
assessment. Psychology in the Schools, 34(2), pp. 98–106.
Greenspan, S.I., & Meisels, S.J. (1996). Toward a new vision for the
developmental assessment of infants and young children. In S.J.
Meisels & E Fenichel (Eds.), New Visions for the developmental
assessment of infants and young children, pp. 11–26. Washington,
DC: Zero to Three: National Center for Infants, Toddlers, and
Families.
Gregory, R.J. (2000). Psychological testing. History, principals, and
applications (3rd ed.). Boston: Allyn and Bacon.
Gregory, R.J. (2007). Psychological testing: History, principles and
applications. (5th ed.). Boston: Pearson Education.
Gregory, R.J. (2010). Psychological testing: History, principles and
applications. (6th edition). Boston: Pearson Education.
Griesel, H. (2001). Institutional views and practices: Access,
admissions, selection and placement. In SAUVCA (2001), The
challenges of access and admissions. Pretoria: South African
Universities Vice-Chancellors Association.
Grieve, K.W. & Richter, L.M. (1990). A factor analytic study of the
Home Screening Questionnaire for infants. SA Journal of
Psychology, 20(4), pp. 277–281.
Grieve, K.W., & Van Eeden, R. (1997). The use of the Wechsler
Intelligence Scales for Adults and the implications for research and
practical application in South Africa. Paper presented at a national
symposium on the future of intelligence testing in South Africa.
HSRC, Pretoria, South Africa, 6 March 1997.
Griffiths, R. (1954). The abilities of babies. London: Child
Development Research Centre.
Griffiths, R. (1970). The abilities of young children. London: Child
Development Research Centre.
Grigorenko, E.L., & Sternberg, R.J. (1998). Dynamic testing.
Psychological Bulletin, 124(1), pp. 75–111.
Groth-Marnat, G. (2003). Handbook of Psychological Assessment (4th
ed.). Hoboken, NJ: John Wiley & Sons.
Groth-Marnat, G. (2009). Handbook of psychological assessment (5th
ed.). Hoboken, NJ, US: John Wiley & Sons.
Guariguata, L., De Beer, I., Hough, R., Bindels, E., Weimers-
Maasdorp, D., Feeley, F.G., & De Wit, T.F.R. (2012). Diabetes,
HIV and other health determinants associated with absenteeism
among formal sector workers in Namibia. BMC Public Health,
12(44), pp. 1–12.
Guilford, J.P. (1936). Psychometric methods. New York: McGraw-Hill
Book Company.
Guichard, J. (2005). Life-long self-construction. International Journal
for Educational and Vocational Guidance, 5(2), pp. 111–124.
Gysbers, N.C.; Heppner, M,J., & Johnston, J.A. (2014). Career
counseling: Holism, diversity and strengths (4th ed.). Alexandria,
VA: American Counseling Association.
Hagtvet, K.A. & Johnsen, T.B. (Eds.). (1992). Advances in test anxiety
research (Vol. 7). Amsterdam: Swets & Zeitlinger.
Hair, J.F., Black, W.C., Babin, B.J., & Anderson, R.E. (2010).
Multivariate data analysis: A global perspective (7th ed.). Upper
Saddle River, NJ: Pearson Prentice-Hall.
Halpern, R. (2000). Early intervention for low-income children and
families. In J. Shonkoff & S. Meisels (Eds.), Handbook of Early
Childhood Intervention (2nd edition), pp. 361–386. Cambridge:
Cambridge University Press.
Hambleton, R.K. (1993). Translating achievement tests for use in
cross-national studies. European Journal of Psychological
Assessment, 9, pp. 54–65.
Hambleton, R.K. (1994). Guidelines for adapting educational and
psychological tests: a progress report. Bulletin of the International
Test Commission in the European Journal for Psychological
Assessment, 10(3), pp. 229–244.
Hambleton, R.K. (2001). The next generation of ITC test translation
and adaptation guidelines. European Journal for Psychological
Assessment, 17, pp. 164–172.
Hambleton, R.K. & Bollwark, J. (1991). Adapting tests for use in
different cultures: Technical issues and methods. Bulletin of the
International Test Commission, 18, pp. 3–32.
Hambleton, R.K., Merenda, P.F., & Spielberger, C.D. (Eds.) (2005).
Adapting Educational and psychological Tests for Cross-Cultural
Assessment. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Hambleton, R.K. & Swaminathan, H. (1985). Item Response Theory:
Principles and applications. Boston: Kluwer-Nijhof.
Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991).
Fundamentals of Item Response Theory. Newbury, CA: Sage
Publishers.

Hambleton, R.K., & Zenisky, A.L. (2010). Translating and adapting


tests for cross-cultural assessments. In D. Matsumoto & F.J.R.
Van de Vijver, Cross-cultural Research Methods in Psychology
(pp. 46–70). Cambridge, NY: Cambridge University Press.
doi:10.1017/CBO9780511779381.004
Hanekom, N. & Visser, D. (1998). Instemmingsgeneigd-heid en
verskillende item en responsformate in ‘n gesommeerde
selfbeoordelingskaal [Response acquiescence and different item
and response formats in summated self-completion scales].
Tydskrif vir Bedryfsielkunde, 24(3), pp. 39–47.
Harris, D.B. (1963). Children’s drawings as measures of intellectual
maturity: A revision and extension of the Goodenough Draw-a-Man
test. New York: Harcourt, Brace and World.
Hart, J.W. & Nath, R. (1979). Sociometry in Business and Industry:
New Developments in Historical Perspective. Group
Psychotherapy, Psychodrama and Sociometry, 32, pp. 128–149.
Hartung, P.J. (2007). Career construction: Principles and practice. In
K. Maree (Ed.), Shaping the story: A guide to facilitating narrative
counselling (pp. 103–120). Pretoria, South Africa: Van Schaik.
Hartung, P.J. (2015). The career construction interview. In M.
McMahon & M. Watson (Eds.), Career assessment: Qualitative
approaches (pp. 115–121). Rotterdam: The Netherlands: Sense.
Hatang, S. & Venter, S. (Eds.), (2011). Nelson Mandela by Himself.
Johannesburg, South Africa: Pan Macmillan South Africa.
Hathaway, S.R. & McKinley, J.C. (1989). MMPI-2. Minneapolis:
University of Minnesota Press.
Hattie, J., & Cooksey, R.W. (1984). Procedures for assessing the
validities of tests using the ‘Known-Groups’ method. Applied
Psychological Measurement, 8(3), pp. 295-305.
Hatuell, C. (2004). The subjective well-being and coping resources of
overweight adults at Port Elizabeth fitness centres. Unpublished
master’s treatise, University of Port Elizabeth.
Hay, L. (2010). In practice. Perspectives in Public Health, 130(2), 56.
doi: 10.1177/17579139101300020303
Hay, J.F. & Pieters, H.C. (1994). The interpretation of large
differences in a psychiatric population between verbal IQ (VIQ)
and non- verbal IQ (NVIQ) when using the Senior South African
Individual Scale (SSAIS). In R. van Eeden, M. Robinson, & A.
Posthuma (Eds.), Studies on South African Individual Intelligence
Scales, pp. 123–138. Pretoria: HSRC.
Health Professions Act, Act 56 of 1974. Health Professions Council of
South Africa.
Health Professions Council of South Africa. (2014, August). General
ethical guidelines for good practice in telemedicine. Retrieved on
13 September 2018 from:
http://www.hpcsa.co.za/Uploads/editor/UserFiles/downloads/conduc
Heaven, P.C.L., Connors, J., & Stones, C.R. (1994). Three or five
personality dimensions? An analysis of natural language terms in two
cultures. Personality and Individual Differences, 17, pp. 181–
189. doi:10.1016/0191-8869(94)90024-8
Heaven, P.C. L., & A. Pretorius. (1998). Personality structure among
black and white South Africans. Journal of Social Psychology,
138(5), pp. 664–66. doi:10.1080/00224549809600422
Herman, A. A., Stein, D. J., Seedat, S., Heeringa, S. G., Moomal, H.,
& Williams, D.R. (2009). The South African Stress and Health
(SASH) study: 12-month and lifetime prevalence of common
mental disorders. South African Medical Journal, 99(2), 339–34.
Available at
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3191537/pdf/nihms32
Helms-Lorenz, M., Van de Vijver, F.J.R., & Poortinga, Y.H. (2003).
Cross-cultural differences in cognitive performance and
Spearman’s hypothesis: g or c? Intelligence, 31, pp. 9–29.
Henn, C.M., Hill, C., & Jorgensen, L.I. (2016). An investigation into the
factor structure of the Ryff Scales of Psychological Well-being. SA
Journal of Industrial Psychology/SA Tydskrif vir Bedryfsielkunde,
42(1), a1275. doi: http://dx.doi.org/10.4102/sajip.v42i1.1275
Heppner, M.J., & Jung, A. (2012). When the music changes, so does
the dance: With shifting U.S. demographics, how do career centers
need to change? Asian Journal of Counselling, 19(1&2), pp.
27−51.
Herbst, T.H. & Maree, K.G. (2008). Thinking style preference,
emotional intelligence and leadership effectiveness. SA Journal of
Industrial Psychology, 34(1), pp. 32–41.
Herman, A. A., Stein, D. J., Seedat, S., Heeringa, S. G., Moomal, H.,
& Williams, D.R. (2009). The South African Stress and Health
(SASH) study: 12-month and lifetime prevalence of common
mental disorders. South African Medical Journal, 99(2), pp. 339–
34. Available at
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3191537/pdf/nihms32
Herrnstein, R.J. & Murray, C. (1994). The bell curve: Intelligence and
class structure in American life. New York: Free Press.
Heyns, P. M., Venter, J. H., Esterhuyse, K. G., Bam, R. H., Odendaal,
D.C. (2003). Nurses caring for patients with Alzheimer’s disease:
Their strengths and risk of burnout. SA Journal of Psychology, 33,
pp. 80–85.
Hill, K. & Sarason, S.B. (1966). The relation of test anxiety and
defensiveness to test and school performance over the elementary
school years. Monographs of the Society for Research in Child
Development, 31(2), pp. 1–76.
Hirschfeld, R.R. (2000). Validity studies: Does revising the intrinsic
and extrinsic subscales of the Minnesota Satisfaction
Questionnaire Short Form make a difference? Educational
Psychological Measurement, 60, pp. 255–270.
Hirschi, A., & Läge, D. (2007). Holland’s secondary constructs of
vocational interests and career choice readiness of secondary
students. Journal of Individual Differences, 28(4), pp. 205–218.
Hoffman, B.J., Gorman, C.A., Blair, C.A., Meriac, J.P., Overstreet, B.,
& Atchley, E.K. (2012). Evidence for the effectiveness of an
alternative multisource performance rating methodology.
Personnel Psychology.
Hoffman, R.R., & Deffenbacher, K.A. (1992). A brief history of applied
cognitive psychology. Applied Cognitive Psychology, 6, pp. 1−48.
Hogan, T.P. & Cannon, B. 2003. Psychological Testing: A practical
introduction. New York: Wiley & Sons
Hogan Assessment Systems (2012). Hogan Personality Inventory:
South African English manual supplement. Tulsa, OK: Author.
Hogan Assessment Systems (2014a). Motives, Values, Preferences
Inventory: South African English manual supplement. Tulsa, OK:
Author.
Hogan Assessment Systems (2014b). Hogan Development Survey:
South African English manual supplement. Tulsa, OK: Author.
Holburn, P.T. (1992). Test bias in the Intermediate Mental Alertness,
Mechanical Comprehension, Blox and High Level Figure
Classification Tests (Contract Report C/Pers 453). Pretoria: HSRC.
Holding, P., Taylor, H., Kazungu, S., Mkala, T., Gona, J., Mwamuye,
B., Mbonani, L., & Stevenson, J. (2004). Assessing cognitive
outcomes in a rural African population. Journal of the International
Neuropsychological Society, 10(2), pp. 261–270.
Holland, J.L. (1997). Making vocational choices: A theory of vocational
personalities and work environments (3rd ed.). Lutz, FL:
Psychological Assessment Resources.
Holland, P.W. & Thayer, D.T. (1988). Differential item performance
and the Mantel Haenszel procedure. In H. Wainer & H.I. Braun
(Eds), Test validity, pp. 129–145. Hillsdale, NJ: Lawrence Erlbaum
Associates, Inc.
Horn, B.S. (2000). A Xhosa translation of the revised NEO Personality
Inventory. Unpublished master’s dissertation, University of Port
Elizabeth, South Africa.
Howell, D.C. (1995). Fundamental statistics for the behavioral
sciences. Belmont, CA: Wadsworth Publishing.
Hresko, W.P., Hammill, D.D., Herbert, P.G. and Baroody, A.J. (1988).
Screening children for related early educational needs. Austin.
HSRC (1974). Manual for the Aptitude Test for School Beginners.
Pretoria: HSRC.
HSRC (1994). Manual for the Grover-Counter Scale of Cognitive
Development. Pretoria: HSRC.
HSRC (1999/2000). HSRC Test Catalogue: Psychological and
proficiency tests. Pretoria: HSRC.
Huarte, J. (1698) Examen de Ingenios, or, the tryal of wits:
Discovering the great difference of wits among men, and what sort
of learning suits best with each genius. Charleston, South
Carolina, US: Bibliolife DBA of BibilioBazaar II LLC.
Hugo, H.L.E. & Claassen, N.C.W. (1991). The functioning of the
GSAT Senior for students of the Department of Education and
Training (Report ED–13). Pretoria: HSRC.
Hulin, C.L., Drasgow, F., & Parsons, C.K. (1983). Item Response
Theory: Application to psychological measurement. Home-wood,
Illinois: Dow Jones-Irwin.
Hurlock, E.B. (1978). Child Development: International Student
Edition (6th ed.). Tokyo: McGraw-Hill.
Huysamen, G.K. (1978). Beginsels van Sielkundigemeting [Principles
of Psychological Measurement]. Pretoria: H & R Academica.
Huysamen, G.K. (1986). Sielkundige meting – ’n Inleiding
[Psychological measurement: An Introduction]. Pretoria:
Academica.
Huysamen, G.K. (1996a). Psychological measurement: An
introduction with South African examples (3rd revised ed.).
Pretoria: Van Schaik.

Huysamen, G.K. (1996b). Matriculation marks and aptitude test scores


as predictors of tertiary educational success in South Africa. South
African Journal of Higher Education, 10(2), pp. 199–207.
Huysamen, G.K. (2002). The relevance of the new APA standards for
educational and psychological testing for employment testing in
South Africa. SA Journal of Psychology, 32(2), pp. 26–33.
Huysamen, G.K. (2004). Statistical and extra-statistical considerations
in differential item functioning analyses. SA Journal of Industrial
Psychology, 30(4). doi:10.4102/sajip.v30i4.173
Huysamen, G.K. (2006). Coefficient Alpha: Unnecessarily ambiguous;
unduly ubiquitous. SA Journal of Industrial Psychology, 32(4), pp.
34–40.
IBM Corporation. (2012). IBM SPSS Amos User Manual. Somers, NY:
Author.
IBM Corporation. (2012). IBM Statistical Package for the Social
Sciences (SPSS). Somers, NY: Author.
Iliescu, D. & Ispas, D. (2016). Personality assessment. In F.T. Leong,
D. Bartram, F.M. Cheung, K.F. Geisinger, & D Iliescu. (Eds.), The
ITC international handbook of testing and assessment (pp.
134−146). New York, NY: Oxford University Press.
Institute for Personality and Ability Testing. (2006). 16PF5 User’s
manual: South African version. Champaign, IL: Author.
Institute for Personality and Ability Testing (2009). 16PF South African
version: User’s manual (2nd ed.). Champaign, IL: Institute for
Personality and Ability Testing.
International Guidelines (2000). Guidelines and ethical considerations
for assessment center operations. International Task Force on
Assessment Center Guidelines. San Francisco, CA: International
Congress on Assessment Center Methods.
International Test Commission (ITC) (1999). International guidelines
for test-use (Version 2000). Prepared by D. Bartram. Available at
http://www.intestcom.org
International Test Commission (ITC) (2001). International guidelines
for test-use. Available at http://www.intestcom.org
International Test Commission (ITC) (2005, July 1). The ITC guidelines
on computer-based and Internet-delivered testing. Retrieved on
13 September 2018 from:
https://www.intestcom.org/files/guideline_computer_based_testing.p
International Test Commission (ITC) (2006). International guidelines
on computer-based and Internet-delivered testing. International
Journal of Testing, 6(2), pp. 143–171. Can also be accessed from
www.intestcom.org
International Test Commission (ITC) (2010a, July). The ITC test-
taker’s guide to technology- based testing. Retrieved on 13
September from:
https://www.intestcom.org/files/test_taker_guide_brochure.pdf
International Test Commission (ITC) (2010b). A test-taker’s guide to
technology-based testing. [http://www.intestcom.org].
International Test Commission (ITC) (2012). International Test
Commission Guidelines for Quality Control in Scoring, Test
Analysis, and Reporting of Test Scores. [Retrieve from
http://www.intestcom.org].
International Test Commission (ITC) (2016). The ITC guidelines for
translating and adapting tests (2nd ed.). Available at
http://www.intestcom.org.
Ireton, H. (1988). Preschool Development Inventory. Minneapolis:
Behavior Science Systems.
Isaacson, L.E. & Brown, D. (1997). Career information, career
counselling and career development. Boston, MA: Allyn and
Bacon.
Ivey, A. E., Ivey, M. B., & Simek-Morgan, L. (1997). Counselling and
psychotherapy: A multicultural perspective (4th ed.). Boston, MA:
Allyn and Bacon
Jacklin, L. & Cockcroft, K. (2013). The Griffiths Mental Developmental
Scales: An overview and a consideration of their relevance for
South Africa. In S. Laher & K. Cockcroft (Eds.), Psychological
assessment in South Africa: Research and applications (pp. 169−
185). Johannesburg: Wits University Press.
James, L., Jacobs, K.E., & Roodenburg, J. (2015). Adoption of the
Cattell-Horn-Carroll model of cognitive abilities by Australian
psychologists. Australian Psychologist, 50, pp. 194−202.
Jansen, P. & Greenop, K. (2008). Factor analyses of the Kaufman
assessment battery for children assessed longitudinally at 5 and
10 years. SA Journal of Psychology, 38(2), pp. 355–365.
Janse van Rensburg, K. & Roodt, G. (2005). A normative instrument
for assessing the mentoring role. SA Journal of Human Resource
Management, 3(3), pp. 10–19.
Johnson, T.P. (1998). Approaches to equivalence in cross-cultural
and cross-national survey research. ZUMA-Nachrichten Spezial, 3,
pp. 1–40.
Johnson, T.P., Shavitt, S., & Holbrook, A.L. (2011). Survey response
styles across cultures. In D. Matsumoto & F.J.R. van de Vijver,
Cross-cultural Research Methods in Psychology (pp. 130–178).
Cambridge, NY: Cambridge University Press.
Johnson, W., Bouchard, T.J., Krueger, R.F., McGue, M., &
Gottesman, I.I. (2004). Just one g: Consistent results from three
test batteries. Intelligence, 32, pp. 95–107.
Jones, N., Hill, C., & Henn, C. (2015). Personality and job satisfaction:
Their role in work-related psychological well-being. Journal of
Psychology in Africa, 25(4), pp. 297−304. doi:
10.1080/14330237.2015.1078086
Jöreskog, KG., & Sörbom, D. (2012). LISREL 8.8. Chicago: Scientific
Software International.
Joska, J.A., Westgarth-Taylor, J.W., Hoare, J., Thomas, K.G.F., Paul,
R., Myer, L., Stein, D.J. (2011). Validity of the International HIV
Dementia Scale in South Africa. AIDS Patient Care and STDs, 25,
pp. 95–101.
Joubert, M. (1981). Manual for the Interpersonal Relations
Questionnaire. Pretoria: HSRC.
Joubert, M. (1988). Manual for the School- readiness Evaluation by
Trained Testers (SETT). Pretoria: National Institute for
Psychological and Edumetric Research of the HSRC.
Joubert, T. & Kriek, H.J. (2009). Psychometric comparison of paper-
and-pencil and online personality assessments in a selection
setting. SA Journal of Industrial Psychology/SA Tydsrkif vir
Bedryfsielkunde 35(1), pp. 78–88; Art. #727, 11 pages.
DOI:10.4102/sajip.v35i1.727.
JvR Africa Group (2017) Catalogue. Johannesburg, South Africa: JvR
Africa Group.
JvR Psychometrics (2015). Ravens Progressive Matrices: South
African research supplement. Johannesburg, South Africa: Author
Joyce-Beaulieu, D. & Calleung, C.M. (2016). Children and
adolescents. In F.T.L. Leong, D. Bartram. F.M. Cheung, K.F.
Geisinger & Iliescu, D. (Eds.), The ITC international handbook of
testing and assessment (pp. 276−289). New York, NY: Oxford
University Press
Kafka, G.J. & Kozma, A. (2002). The construct validity of Ryff’s scales
of psychological well-being (SPWB) and their relationship to
measures of subjective well-being. Social Indicators Research, 57,
pp. 171−190.
Kamara, A. (1971). Cognitive development among Themne children of
Sierra Leone. Unpublished doctoral dissertation, University of
Illinois at Urbana-Champaign, US.
Kammann, R. & Flett, R. (1983). Affectometer 2: A scale to measure
current level of general happiness. American Journal of
Psychology, 35(2), pp. 259–265.
Kanjee, A, (2007). Using logistic regression to detect bias when
multiple groups are tested. SA Journal of Psychology, 37(1), pp.
47–61.
Kaplan, R.M. & Saccuzzo, D.P. (1997). Psychological testing:
Principles, applications, and issues (4th edition). Pacific Grove,
California, USA: Brooks/Cole Publishing Company.

Kaplan, R.M. & Saccuzzo, D.P. (2009). Psychological testing:


Principles, applications, and issues (7th ed.). Belmont, California,
USA: Wadsworth, Cengage Learning.
Kathuria, R., & Serpell, R. (1999). Standardization of the Panga Muntu
Test, a nonverbal cognitive test developed in Zambia. Journal of
Negro Education, 67, pp. 228−241.
Kaufman, A.S. (1979). Intelligent testing with the WISC–R. New York:
John Wiley & Sons.
Kennepohl, S., Shore, D., Nabors, N., & Hanks, R. (2004). African
American acculturation and neuropsychological test performance
following traumatic brain injury. Journal of the International
Neuropsychological Society, 10(4), pp. 566–577.
Kennet-Cohen, T., Bronner, S. & Cohen, Y. (2003). Improving the
predictive validity of a test: A time-efficient perspective. Paper
presented at the annual meeting of the National Council on
measurement in Education, Chicago, 2003.
Kerlinger, F.N. & Lee, H.B. (2000). Foundations of Behavioral
Research (4th ed.). Fort Worth, TX: Harcourt College Publishers.
Kern, R.S., Nuechterlein, K.H., Green, M.F., Baade, L.E., Fenton,
W.S., Gold, J.M. et al. (2008). The MATRICS Consensus
Cognitive Battery, Part 2: Co-Norming and Standardization. The
American Journal of Psychiatry, 165, pp. 214–220.
Keyes, C.L. (2005). Mental illness and/or mental health? Investigating
axioms of the complete state model of health. Journal of
Consulting and Clinical Psychology, 73(3), pp. 539−548. doi:
10.1037/0022-006x.73.3.539
Keyes, C.L.M. & Haidt, J. (2003). Introduction: Human flourishing –
the study of that which makes life worthwhile. In C.L.M. Keyes & J.
Haidt. Flourishing Postive Psychology and the life well-lived.
Washington: American Psychological Association.
Keyes, C.L.M., Wissing, M., Potgieter, J. P., Temane, M., Kruger, A.,
Van Rooy, S. (2008). Evaluation of the mental continuum-short
form (MHF-SF) in Setswana-speaking South Africans. Clinical
Psychology & Psychotherapy, 15, pp. 181–192.
Khan, S.M. (2017). Multimodal behavioural analytics in intelligent
learning and assessment systems. In A.A. Von Davier, M. Zhu, &
P. Kylionen (Eds.), Methodology of educational measurement and
assessment: Innovative assessment of collaboration (pp.
173−184). Cham: Springer International Publishing.
Kinnear, C. & Roodt, G. (1998). The development of an instrument for
measuring organisational inertia. Journal of Industrial Psychology,
24(2), pp. 44–54.
Kim, D.H. & Smith, J.D. (2010). Evaluation of two observational
assessment systems for children’s developmental learning. NHSA
Dialog: A Research-to-Practice Journal for the Early Childhood
Field, 13(4), pp. 253−267. doi: 10.1080/15240754.2010.527022.
Kitshoff, C., Campbell, L., & Naidoo, S.S. (2012). The association
between depression and adherence to antiretroviral therapy in
HIV-positive patients, KwaZulu-Natal, South Africa. South African
Family Practice, 54(2), pp. 145−150.
Kline, P. (1994). An easy guide to factor analysis. London: Routledge.
Kline, R.B. (2011). Principles and practice of Structural Equation
Modeling (3rd ed.). New York: Guilford.
Knoetze, J. (2013). Sandworlds, storymaking, and letter writing: The
therapeutic sandstory method. South African Journal of
Psychology, 43, pp. 459−469.
Koch, S.E., Foxcroft, C.D., & Watson, A.S.R. (2001). A developmental
focus to student access at UPE: Process and preliminary insights
into placement assessment. South African Journal for Higher
Education, 15(2), pp. 126–131.
Koller, P.S. & Kaplan, R.M. (1978). A two-process theory of learned
helplessness. Journal of Personality and Social Psychology, 36,
pp. 1077–1083.
Kroukamp, T.R. (1991). The use of personality-related and cognitive
factors as predictors of scholastic achievement in Grade One: A
multivariate approach. Unpublished doctoral dissertation,
University of Port Elizabeth.
Krug, S.E. & Johns, E.F. (1986). A large-scale cross-validation of
second-order personality structure defined by the 16PF.
Psychological Reports, 59, pp. 683–693.
Kruk, R.S. & Muter, P. (1984). Reading continuous text on video
screens. Human Factors, 26, pp. 339–345.
Kuncel, N.R., Hezlett, S.A., & Ones, D.S. (2004). Academic
performance, career potential, creativity, and job performance:
Can one construct predict them all? Journal of Personality and
Social Psychology, 86, pp. 148–161.
Laher, S. (2011). The applicability of the NEO-PI-R and the CPAI-2 in
the South African context. Unpublished PhD dissertation,
University of the Witwatersrand, Johannesburg.
Laher, S., & Cockcroft, K. (2013) Psychological assessment in South
Africa: Research and applications. Johannesburg: Wits University
Press.
Laher, S., & Cockcroft, K. (2017). Moving from culturally biased to
culturally responsive assessment practices in low-resource,
multicultural settings. Professional Psychology: Research and
Practice, 48(2), pp. 115−121. doi:
http://dx.doi.org/10.1037/pro0000102
Landman, J. (1988). Appendix to the manual of the Junior South
African Individual Scales (JSAIS): Technical details and tables of
norms with regard to 6- to 8-year-old Indian pupils. Pretoria:
HSRC.
Langley, P.R. (1989). Gerekenariseerde loopbaanvoorligting: ’n
Evaluering van die DISCOVER-stelsel [Computerized career
counselling: An evaluation of the Discover system]. Unpublished
doctoral thesis. Rand Afrikaans University.
Langley, R. (1993). Manual for use of the Life Role Inventory. Pretoria:
HSRC.
Langley, R. (1995). The South African work importance study. In D.E.
Super, B. Sverko & C.E. Super (Eds.) Life roles, values, and
careers, pp. 188–203. San Francisco, CA: Jossey-Bass.
Langley, R., Du Toit, R., & Herbst, D.L. (1992a). Manual for the
Career Development Questionnaire. Pretoria: HSRC.
Langley, R., Du Toit, R., & Herbst, D.L. (1992b). Manual for the
Values Scale. Pretoria: HSRC.
Lanyon, R.I. & Goodstein, L.D. (1997). Personality assessment (3rd
ed.). New York: Wiley.
Larsen, R.J. & Buss, D.M. (2002). Personality psychology: Domains of
knowledge about human nature. Boston, MA: McGraw-Hill.
Lee, J.A. (1986). The effects of past computer experience on
computerized aptitude test performance. Educational and
Psychological Measurement, 46, pp. 727–733.
Leeson, H.V. (2006). The mode effect: A literature review of human
and technological issues in computerized testing. International
Journal of Testing, 6(1), pp. 1–24.
Leiter, M.P. & Maslach, C. (2005). Banishing burnout: Six strategies
for improving your relationship with work. San Francisco,
California: Jossey-Bass.
Leong, F.T.L., Bartram, D., Cheung, F.M., Geisinger, K.F., & Iliescu,
D. (2016). The ITC international handbook of testing and
assessment. New York, NY: Oxford University Press.
Le Roux, M.C. & Kagee, A. (2008). Subjective wellbeing of primary
healthcare patients in the Western Cape, South Africa. South
African Family Practice, 50(3), pp. 68–68.
Levin, M.M., & Buckett, A. (2011). Discourses regarding ethical
challenges in assessments – Insights through a novel approach.
SA Journal of Industrial Psychology/SA Tydskrif vir
Bedryfsielkunde, 37(1), Art. #949, pp. 251–263.
http://dx.doi.org/10.4102/sajip.v37i1.949
Lewandowski, D.G. & Saccuzzo, D.P. (1976). The decline of
psychological testing: Have traditional procedures been fairly
evaluated? Professional Psychology, 7, pp. 177–184.
Lezak, M.D. (1987). Neuropsychological assessment (2nd ed.). New
York: Oxford University Press.
Lezak, M.D. (1995). Neuropsychological Assessment (3rd ed.). New
York: Oxford University Press.
Lezak, M., Howieson, D., & Loring, D. (2004) Neuropsychological
assessment (4th ed.). New York: Oxford University Press.
Lievens, F. (2006). The use of Situational Judgment Tests (STJ) in
high-stakes selection settings. Keynote address presented at the
26th Annual Conference of the AC study Group, 23–24 March,
Protea Hotel, Stellenbosch.
Lievens, F., Peeters, H., Schollaert, E. (2008). Situational judgment
tests: A review of recent research. Personnel Review, 37(4), pp.
426–441. doi:10.1108/00483480810877598
Likert, R. (1932). A technique for the measurement of attitudes.
Archives of Psychology, no 140.
Lilford, N. (10 September 2008). Absenteeism reaches new heights in
South Africa. HR Future. Available from
http://www.hrfuture.net/display_web_article.php?
article_id=144&category_id=5
Linn, R.L. (1989). Current perspectives and future directions. In R.L.
Linn (Ed.), Educational measurement (3rd ed.). New York:
Macmillan.
Linn, R.L. & Gronlund, N.E. (2000). Measurement and assessment in
teaching (8th ed.). Upper Saddle River, New Jersey: Prentice-Hall,
Inc.
Llorens, S., Bakker, A.B., Schaufeli, W., & Salanova, M. (2006).
Testing the robustness of the job demands-resources model.
International Journal of Stress Management, 13(3), pp. 378–391.
Locke, E.A. (1969). What is job satisfaction? Organizational behavior
and human performance, 4, pp. 309–336.
Locurto, C. (1991). Sense and nonsense about IQ – the case for
uniqueness. New York: Praeger.
Loeppke, R. (2008). The value of health and the power of prevention.
International Journal of Workplace Health Management, 1(23), pp.
95–108.
Loeppke, R., Taitel, M., Haufle, V., Parry, T., Kessler, R.C., & Jinnett,
K. (2009). Health and productivity as a business strategy: A
multiemployer study. JOEM, 51(4), pp. 411–428.
Lopes, A., Roodt, G., & Mauer, R. (2001). The predictive validity of the
APIL-B in a financial institution. Journal of Industrial Psychology,
27(1), pp. 61–69.
Lord, F.M. (1980). Applications of Item Response Theory to practical
testing problems. Hillsdale, NJ: Erlbaum.
Louw, D.A. & Allan, A. (1998). A profile of forensic psychologists in
South Africa. SA Journal of Psychology, 28(4), pp. 234–241.
Lovibond, S.H., & Lovibond, P.F. (1995). Manual for the Depression
Anxiety Stress Scales. Psychology foundation monograph (2nd
ed.). Sydney, Australia: Psychology Foundation of Australia.
Lucas, M. (2013). Neuropsychological assessment in South Africa. In
S. Laher & K. Cockcroft (Eds.), Psychological assessment in South
Africa: Research and applications (pp. 186−200). Johannesburg:
Wits University Press.
Luecht, R.M. (2008). Assessment engineering in test design,
development, assembly and scoring. Keynote address presented
at the East Coast Organization of Language Testers (ECOLT)
Conference, Washington, DC. Retrieved on 30 November 2012
from http://www.govtilr.org/Publications/ECOLT08-AEKeynote-
RMLuecht-07Nov08[1].pdf.
Luecht, R.M. (2009). Adaptive computer-based tasks under an
assessment engineering paradigm. In D.J. Weiss (Ed.),
Proceedings of the 2009 GMAC Conference on Computerized
Adaptive Testing. Retrieved 30 November 2012 from
www.psych.umn.edu/psylabs/CATCentral/
Luiz, D.M. (1994). Children of South Africa: In search of a
developmental profile. Inaugural and Emeritus Address D34. Port
Elizabeth: University of Port Elizabeth.

Luiz, D.M. (1997). Griffiths Scales of Mental Development: South


African studies. Port Elizabeth: University of Port Elizabeth.
Luiz, D.M., Barnard, A., Knoesen, N., Kotras, N., Horrocks, S.,
McAlinden, P., Challis, D., & O’Connell, R. (2006). Griffiths Mental
Development Scales – Extended Revised. Administration Manual.
Oxford: Hogrefe – The Test Agency Ltd.
Luiz, D.M, Foxcroft, C.D., & Tukulu, A.N. (2004). The Denver II Scales
and the Griffiths Scales of Mental Development: A correlational
study. Journal of Child and Adolescent Mental Health, 16(2), pp.
77–81. doi: 10.2989/17280580409486573
Luiz, D.M., & Heimes, L. (1988). The Junior South African Intelligence
Scales and the Griffiths Scales of Mental Development: A
correlative study. In D.M. Luiz (Ed.), Griffiths Scales of Mental
Development: South African studies, pp. 1–15. Port Elizabeth:
University of Port Elizabeth.
Lundy, A. (1985). The reliability of the Thematic Apperception Test.
Journal of Personality Assessment, 49, pp. 141–145.
MacAleese, R. (1984). Use of tests. In H.D. Burck & R.C. Reardon
(Eds.), Career development interventions. Springfield, IL: Charles
C. Thomas.
Maccow, G. (2011). Administration, scoring, and basic interpretation
of the Wechsler Adult Intelligence Scale, Fourth Edition. Pearson
Education, Inc. Available at
http://images.pearsonclinical.com/Images/PDF/WAIS-
IV_Adm_Scoring_Int_Sep_2011_Handout.pdf
Madge, E.M. (1981). Manual for the Junior South African Individual
Scales (JSAIS). Part I: Development and standardization. Pretoria:
HSRC.
Madge, E.M. & Du Toit, L.B.H. (1967) ’n Samevatting van die
bestaande kennis in verband met die primêre persoonlikheids-
faktore wat deur die Persoonlikheids-vraelys vir Kinders (PVL), die
Hoërskool-persoonlikheidsvraelys (PSPV) en die Sestien-
persoonlikheidsfaktorvraelys (16PF) gemeet word. (Byvoegsel tot
die handleiding vir die PVK, HSPV en 16PF). [A conclusion to the
existing knowledge regarding the primary personality factors that
are measured by the Personality Questionnaire for Children, the
High School Personality Questionnaire and the Sixteen Personality
Factors Questionnaire (Addition to the guide for the PVK, HSPV
and 16PF)]. Pretoria: HSRC.
Madge, E. & Van der Walt, H.S. (1997) Interpretation and use of
psychological tests. In K. Owen & J.J. Taljaard, Handbook for the
use of psychological and scholastic tests of the HSRC (revised
edition; pp. 121–137). Pretoria: HSRC.
Major, J.T., Johnson, W., & Deary, I.J. (2012). Comparing models of
intelligence in Project TALENT: The VPR model fist better than the
CHC and extended Gf-Gc models. Intelligence, 40, pp. 543–559.
Makhubela, M. (2016). ‘From psychology in Africa to African
psychology’: Going nowhere slowly. Psychology in Society (PINS),
52, pp. 1–18. Retrieved on 12 April 2018 from
http://dx.doi.org/10.17159/2309-8708/2016/n52a1.
Makransky, G., & Glas, C.A.W. (2013). The applicability of
multidimensional computerized adaptive testing for cognitive ability
measurement in organizational assessment. International Journal
of Testing, 13, pp. 123−139.
Malda, M., Van de Vijver, F.J.R., & Temane, Q.M. (2010). Rugby
versus soccer in South Africa: Content familiarity contributes to
cross-cultural differences in cognitive test scores. Intelligence, 38,
pp. 582–595.
Malebo, A., Van Eeden, C., & Wissing, M.P. (2007). Sport
participation, psychological well-being, and psychosocial
development in a group of young black adults. SA Journal of
Psychology, 37(1), pp. 188–206.
Mall, S., Lund, C., Vilagut, G., Alonso, J., Williams, D. R., & Stein, D.J.
(2015). Days out of role due to mental and physical illness in the
South African stress and health study. Social Psychiatry and
Psychiatric Epidemiology, 50(3), pp. 461−468.
Mardell-Czudnowski, C.D. & Goldenberg, D.S. (1990). Developmental
Indicators for the Assessment of Learning – Revised. Circle Pines,
MN: American Guidance Service.
Maree, J.G. (2007). First steps in developing an interest inventory to
facilitate narrative counselling. In K. Maree (Ed.), Shaping the
story: A guide to facilitating narrative counselling (pp. 176–205).
Pretoria, South Africa: Van Schaik.
Maree, J.G. (2010a). Brief overview of the advancement of
postmodern approaches to career counselling. Journal of
Psychology in Africa, 20(3), pp. 361–368.
Maree, J.G. (2010b). Career Interest Profile technical manual (3rd
ed.). Johannesburg: Jopie van Rooyen & Partners.
Maree, J.G. (2013). Latest developments in career counselling in
South Africa: Towards a positive approach. South African Journal
of Psychology, 43(4), pp. 409−421.
Maree, J.G., & Morgan, B. (2012). Toward a combined qualitative-
quantitative approach: Advancing postmodern career counselling
theory and practice. Cypriot Journal of Educational Sciences.
Maree, J.G. & Taylor, N. (2016). Development of the Maree Career
Matrix: a new interest inventory. South African Journal of
Psychology, 46(4), 462-476.
Maree, J.G., & Morgan, B. (2012). Toward a combined qualitative-
quantitative approach: Advancing postmodern career counselling
theory and practice. Cypriot Journal of Educational Sciences, 7(4),
pp. 311−325.
Maree, K. (2017). Life design counselling. In G.B. Stead & M.B.
Watson (Eds.), Career psychology in the South African context
(pp. 105-118). Pretoria, South Africa: Van Schaik.
Markus, H. R., & Kitayama, S. (1998). The cultural psychology of
personality. Journal of Cross-Cultural Psychology, 29, pp. 63–87.
doi:10.1177%2F0022022198291004
Marquez, J. (2008). Study: Wellness plans failing to engage workers.
Workforce Management, 87(9), p. 15.
Maslach, C. & Jackson, S.E. (1981). The measurement of
experienced burnout. Journal of Occupational Behaviour, 2, pp.
99–113.

Maslach, C., Jackson, S.E & Leiter, M.P. (1996). Maslach Burnout
Inventory: Manual (3rd ed.). Palo Alto, CA: Consulting
Psychologists Press.
Maslach, C., Jackson, S.E., Leiter, M.P., Schaufeli, W.B., & Schwab,
R.L. (n.d.). The leading measure of burnout. Retrieved from
http://www.mindgarded.com/products/mbi.htm
Maslach, C. & Jackson, S.E. (1981). The measurement of
experienced burnout. Journal of Occupational Behaviour, 2, pp.
99−113.

Maslach, C., Schaufeli, W.B., & Leiter, M.P. (2001). Job burnout.
Annual Reviews of Psychology, 52, pp. 397–422.
Mason, J.C. (1992, July). Healthy equals happy plus productive.
Management Review, pp. 33–37.
Mason, J.C. (1994, July). The cost of wellness. Management Review,
pp. 29–32.
Matarazzo, J. (1990). Psychological assessment versus psychological
testing: Validation from Binet to the school, clinic and courtroom.
American Psychologist, 45(9), pp. 999–1017.
Matsumoto, D.R. (2000). Culture and psychology: People around the
world (2nd ed.). London: Wadsworth.
Matsumoto, D., & Van de Vijver, F.J.R. (Eds.) (2011). Cross-cultural
research methods in psychology. Cambridge: Cambridge
University Press.
Matthews, G., Zeidner, M., & Roberts, R.D. (2002). Emotional
intelligence: Science and myth. Cambridge, MA: MIT Press.
Mattke, S., Liu, H., Caloyeras, J., Huang, C. Y., Van Busum, K. R.,
Khodyakov, D., & Shier, V. (2013). Workplace wellness programs
study. Rand Health Quarterly, 3(2): 7.
Mayer, J.D. & Salovey, P. (1997). What is emotional intelligence? In
P. Salovey & D. Sluyter (Eds.), Emotional development and
emotional intelligence: Implications for educators, pp. 3–31. New
York: Basic Books.
Mboweni, J.C. (2007). The role of attributional styles, satisfaction with
life, general health and self-esteem on the psychological well-
being of adolescents. Unpublished
doctoral dissertation, North-West University, Mafikeng, South Africa.
McAdams, D.P. (1994). The person: An introduction to personality
psychology. Fort Worth, TX: Harcourt Brace.
McAdams, D.P. (1995). What do we know when we know a person?
Journal of Personality, 63, pp. 365–396.
McCarthy, D. (1944). A study of the reliability of the Goodenough
drawing test of intelligence. Journal of Psychology, 18, pp. 201–
216.
McCaulley, M.H. (2000). The Myers-Briggs Type Indicator in
counseling. In C.E. Watkins, Jr. & V.L. Campbell (Eds), Testing
and assessment in counseling practice (2nd edition), pp. 111–173.
Mahwah, NJ: Lawrence Erlbaum Publishers.
McClaran, A. (2003). From ‘admissions’ to ‘recruitment’: The
professionalisation of higher education admissions. Tertiary
Education and Management, 9, pp. 159–167.
McClelland, D.C., Koestner, R. & Weinberger, J. (1989). How do self-
attributed and implicit motives differ? Psychological review, 96, pp.
690–702. doi: 10.1037/0033-295X.96.4.690
McCrae, R.R., Costa, P.T., Jr., Martin, T.A., Oryol, V.E.,
Rukavishnikov, A.A., Senin, I.G., Hrebícková, M., Urbánek, T.
(2004). Consensual validation of personality traits across cultures.
Journal of Research in Personality, 38, pp. 179–201.
McIntire, S.A. & Miller, L. (2000). Foundations of psychological testing.
Boston: McGraw-Hill Higher Education.
McIntire, S. A., & Miller, L.A. (2006). Foundations of psychological
testing: A practical approach. California: SAGE Publications.
McMahon, M. (2011). The systems theory framework of career
development. Journal of Employment Counseling, 48(4), 170–172.
McMahon, M., Patton, W. & Watson, M.B. (2003). Developing
qualitative career assessment processes. Career Development
Quarterly, 51, pp. 194–202.
McMahon, M. & Watson, M.B. (2008). Systemic influences on career
development: Assisting clients to tell their career stories. The
Career Development Quarterly, 56, pp. 280–288.
McMahon, M. & Watson, M. (Eds.), (2015). Career assessment:
Qualitative approaches. Rotterdam: The Netherlands: Sense.
McMahon, M., Watson, M. & Patton, W. (2005). Qualitative career
assessment: Developing the My System of Career Influences
Reflection Activity. Journal of Career Assessment, 13, pp. 476–
490.
McReynolds, P. (1986). History of assessment in clinical and
educational settings. In Nelson, R.O. and Hayes, S.C. (Eds.),
Conceptual foundations of behavioral assessment. New York: The
Guilford Press.
Mead, A.D., Olson-Buchanan, J.B., & Drasgow, F. (2013).
Technology-based selection. In M.D. Coovert & L.F. Thompson
(Eds.), The psychology of workplace technology (pp. 21−42). New
York: Routledge.
Meade, A.W. (2004). Psychometric problems and issues involved with
creating and using ipsative measures for selection. Journal of
Occupational and Organizational Psychology, 77(4), pp. 531–552.
Meiring, D. (2007), Bias and equivalence of psychological measures
in
South Africa. Ridderkerk, Netherlands: Ridderprint B.V.
Meiring, D., Van de Vijver, F.J.R. & Rothmann, S. (2006). Bias in an
adapted version of the 15FQ+ in South Africa. SA Journal of
Psychology, 36(2), pp. 340–356.
Meiring, D., Van de Viyver, F.J.R., Rothmann, S. & Barrick, M.R.
(2005). Construct, item and method bias of cognitive and
personality measures in South Africa. SA Journal of Industrial
Psychology, 31, pp. 1–8
Meiring, D., Van de Vijver, F.J.R., Rothmann, S. & Sackett, P.R.
(2006). Internal and external bias of cognitive and personality
measures in South Africa. In Meiring, D. (2007), Bias and
equivalence of psychological measures in South Africa, (pp. 67–
102). Ridderkerk, Netherlands: Ridderprint B.V.
Meredith, W. (1993). Measurement invariance, factor analysis and
factorial invariance. Psychometrika, 58(4), pp. 525–543.
Mershon, B. & Gorsuch, R.L. (1988). Number of factors in the
personality sphere. Journal of Personality and Social Psychology,
55, pp. 675–680.
Meyer, J.C. (2003). Manual for the Meyer Interest Questionnaire (MB-
10) and structured vocational guidance. Stellenbosch: University of
Stellenbosch.
Meyer-Rath et al. (2012). Company-level ART provision to employees
is cost saving: A modeled cost-benefit analysis of the impact of
HIV and antiretroviral treatment in a mining workforce in South
Africa, International AIDS Society.
Meyer-Rath, G., Pienaar, J., Brink, B., Van Zyl, A., Muirhead, D.,
Grant, A., Churchyard, G., Watts, C., & Vickerman, P. (2015). The
impact of company-level ART provision to a mining workforce in
South Africa: A cost-benefit analysis. Plos Medicine, 12(9),
e1001869. doi:10.1371/journal.pmed.1001869
Mfusi, S.K. & Mahabeer, M. (2000). Psychosocial adjustment of
pregnant women infected with HIV/AIDS in South Africa. Journal of
Psychology in Africa, South of the Sahara, the Caribbean and
Afro-Latin America, 10(2), pp. 122–145.
Michas, M. (2018). Validation of the Centre for Epidemiologic Studies
Depression Scale: Revised among a non-clinical sample in the
South African workplace. Unpublished master’s dissertation,
University of Johannesburg, South Africa.
Miller, L.A., Lovler, R.L., & McIntire S.A. (2011). Foundations of
Psychological Testing: A practical approach (3rd ed.). Thousand
Oaks, CA: Sage.
Miller, R. (1987) Methodology: A focus for change. In K.F. Mauer &
A.I. Retief (Eds.), Psychology in context: Cross-cultural research
trends in South Africa. Pretoria: HSRC.
Millsap, R.E. (2011). Statistical approaches to measurement
invariance. New York: Routledge/Taylor & Francis.
Mislevy, R.J., Oranje, A., Bauer, M.I., Von Davier, A., Hao, J.,
Corrigan, S., Hoffman, S., DiCerbo, K., & John, M. (2014).
Psychometric considerations in game-based assessments.
Redwood City: Glasslab Institute of Play.
Mistry, J. & Rogoff, B. (1994). Remembering in cultural context. In W.
Lonner & R. Malpass (Eds.), Psychology and culture, pp. 139–144.
MA: Allyn & Bacon.
Mkhize, N. (2004). Psychology: An African perspective. In K. Ratele,
N. Duncan, D. Hook, P. Kiguwa, N. Mkhize & A. Collins (Eds.),
Self, community & psychology (pp. 4.1–4.29). Lansdowne, South
Africa: University of Cape Town Press.
Moch, S.L., Panz, V.R., Joffe, B.I., Havlik, I., & Moch, J.D. (2003).
Longitudinal changes in pituitary-adrenal hormones in South
African women with burnout. Endocrine, 21(3), 267–272. doi:
0969-711X/03/21:267-272.
Moletsane, M.K. (2004). The efficacy of the Rorschach among black
learners in South Africa. Unpublished doctoral thesis, University of
Pretoria. Available at
https://repository.up.ac.za/handle/2263/27930.
Moletsane, M. (2016). Understanding the role of indigenous
knowledge in psychological assessment and intervention in a
multicultural South African context. In R. Ferreira (Ed.),
Psychological assessment: Thinking innovatively in contexts of
diversity (pp. 20−36). Cape Town: Juta and Company.
Moletsane, M., & Eloff, I. (2006). The adjustment of Rorschach
Comprehensive System procedures for South African learners.
South African Journal of Psychology, 36, pp. 880−902.
Moodie, S., Daneri, P., Goldhagen, S., Halie, T., Green, K., &
LaMonte, L. (2014). Early childhood developmental screening: A
compendium of measures for children ages birth to five (OPRE
Report 2014 11). Washington, DC: Office of Planning, Research
and Evaluation, Administration for Children and Families, U.S.
Department of Health and Human Services.
Moodle HQ (2017). About Moodle. Retrieved on 13 September 2018
from: https://docs.moodle.org/34/en/About_Moodle
Moodley, N. (2005, February 9). Absenteeism cost R12bn a year, with
up to R2.2bn due to Aids. Business Report. Retrieved September
10, 2008, from http://www,busrep.co.za/index.php?
fSectionId=611&fArticleId=2403031
Moos, R. & Moos, B. (1986). Family Environment Scale manual (2nd
edition). Palo Alto, California: Consulting Psychology Press.
Moosa, M.Y.H. (2009). Anxiety associeted with colposcopy at Chris
Hani Baragwanath hopsital, Johannesburg. South African Journal
of Psychiatry, 15(2), pp. 48–52.
Morelli, N.A., Mahan, R.P., & Illingworth, A.J. (2014). Establishing the
measurement equivalence of online selection assessments
delivered on mobile versus non-mobile devices. International
Journal of Selection and Assessment, 22(2), pp. 125−138.
Mosam, A., Vawda, N.B., Gordhan, A.H., Nkwanyanat, N., &
Aboobaker, J. (2004). Quality of life issues for South Africans with
acne vulgaris. Clinical and Experimental Dermatology, 30, pp. 6–9.
doi: 10.1111/j.1365-2230.2004.01678.x
Mosdell, J., Balchin, R., & Ameen, O. (2010). Adaptation of aphasia
tests for neurocognitive screening in South Africa. SA Journal of
Psychology, 40(3), pp. 250–261.
Morgan, B. (2010). Career counselling in the 21st century: A reaction
article. Journal of Psychology in Africa, 20(3), pp. 501–504.
Morgan, B., de Bruin, G.P., & de Bruin, K. (2015a). Constructing
Holland’s hexagon in South Africa: Development and initial
validation of the South African Career Interest Inventory. Journal of
Career Assessment, 23(3), pp. 493−511. doi:
10.1177/1069072714547615
Morgan, B., de Bruin, G.P., & de Bruin, K. (2015b). Gender
differences in Holland’s circular/circumplex interest structure as
measured by the South African Career Interest Inventory. South
African Journal of Psychology, 45(3), pp. 349−360. doi:
10.1177/1069072714547615
Morgan, C.D. & Murray, H.A. (1935). A method for investigating
fantasies: The Thematic Apperception Test. Archives of Neurology
and Psychiatry, 34, pp. 289–306.
Moritz, S., Ferahli, S., & Naber, D. (2004). Memory and attention
performance in psychiatric patients. Journal of the International
Neuropsychological Society, 10(4), pp. 623–633.
Morris-Paxton, A. A., Van Lingen, J. M., & Elkonin, D. (2017a). An
evaluation of health information and wellness priorities among
socioeconomically disadvantaged students. Health Education
Journal, 76(3), pp. 271−281.
Morris-Paxton, A. A., Van Lingen, J. M., & Elkonin, D. (2017b).
Wellness and academic outcomes among disadvantaged students
in South Africa: An exploratory study. Health Education Journal,
76(1), pp. 66−76.
Moyo, S., & Theron, C. (2011). A preliminary factor analytic
investigation into the first-order factor structure of the Fifteen
Factor Questionnaire Plus (15FQ+) on a sample of Black South
African managers. SA Journal of Industrial Psychology/SA Tydskrif
vir Bedryfsielkunde, 37(1), Art. #934, 22 pages. doi:
10.4102/sajip.v37i1.934
Mpofu, E. (2002). Indigenization of the psychology of human
intelligence in sub-Saharan Africa. In W.J. Lonner, D.L. Dinnel,
S.A. Hayes, & D.N. Sattler (Eds.), Online readings in psychology
and culture, 5(2). Center for Cross-Cultural Research, Western
Washington University, Bellingham, Washington USA.
Mpofu, E., Ntinda, K. & Oakland, T. (2012). Understanding human
abilities in sub-Saharan African settings. Online readings in
psychology and culture, 4(3). doi: https://doi.org/10.9707/2307-
0919.1036.
Msolo, S. (2017). Campus home to SA’s Serious Games Institute.
North-West University Vaal Triangle News. Retrieved on 13
September 2018 from: www.nwu.ac.za/content/vaal-triangle-
campus-news-campus-home-sa’s-serious-games-institute
Muleya, V.R., Fourie, L., & Schlebusch, S. (2017). Ethical challenges
in assessment centres in South Africa. SA Journal of Industrial
Psychology/SA Tydskrif vir Bedryfsielkunde, 43(0), pp. 1−20,
a1324. doi: https://doi.org/10.4102/sajip.v43i0.1324
Murphy, K.R. & Davidshofer, C.O. (1998). Psychological testing:
principles and applications (4th edition). New Jersey: Prentice-Hall
International, Inc.
Murphy, K. R., & Davidshofer, C.O. (2005). Psychological testing:
principles and applications (6th edition). Upper Saddle River, New
Jersey: Pearson Prentice Hall.
Murphy, A. & Janeke, H.C. (2009). The relationship between thinking
styles and emotional intelligence: An exploratory study. SA Journal
of Psychology, 39(3), pp. 357–375.
Murphy, R., & Maree, D.J.F. (2006). A review of South African
research in the field of dynamic assessment. SA Journal of
Psychology, 36(1), pp. 168–191.
Murphy, E., & Meisgeier, C. (2008). MMTIC(TM) Manual: A guide to the
development and use of the Murphy-Meisgeier Type indicator for
Children(R). Gainesville, FL: Center for Applications of
Psychological Type.
Murstein, B.I. (1972). Normative written TAT responses for a college
sample. Journal of Personality Assessment, 36 (2), pp. 109–147.
Muthén, B.O. (2002). Beyond SEM: General latent variable modeling.
Behaviormetrika, 29(1), pp. 81–117.
Muthén, L.K., & Muthén, B.O. (1998–2017). Mplus User’s Guide (7th
ed.). Los Angeles, CA: Muthén & Muthén.
Mwamwenda, T.S. (1995). Educational Psychology: An African
perspective (2nd ed.). Durban: Butterworths.
Myers, I.B. & McCaulley, M.H. (1985). Manual: A guide to the
development and use of the Myers-Briggs Type Indicator. Palo
Alto, CA: Consulting Psychologists Press.
Naglieri, J.A. (1997). Planning, attention, simultaneous, and
successive theory and the cognitive assessment system: A new
theory-based measure of intelligence. In P. Flanagan, J.L.
Genshaft & P.L. Harrison (Eds.), Contemporary Intellectual
Assessment – Theories, tests and issues, pp. 247–267. New York:
Guilford Press.
Naicker, A. (1994). The psycho-social context of career counselling in
South African schools. SA Journal of Psychology, 24, pp. 27–34.
Naicker, R., & Fouché, C. (2003). The evaluation of an insourced
employee assistance programme. SA Journal of Human Resource
Management, 1(1), pp. 25−31.
Naidoo, D. (2013). Traumatic brain injury: The South African
landscape. The South African Medical Journal, 103(9), pp.
613−614. doi:10.7196/SAMJ.7325.
National Institutes of Health (2017). The NIH toolbox.
HealthMeasures. Retrieved on 13 September 2018 from:
http://www.healthmeasures.net/explore-measurement-
systems/nih-toolbox
Naude, J.L. P., & Rothmann, S. (2004). The validation of the Maslach
Burnout Inventory – Human services survey for emergency
medical technicians in Gauteng. SA Journal of Industrial
Psychology, 30, pp. 21–28.
Nauta, M.M. (2010). The development, evolution, and status of
Holland’s theory of vocational personalities: Reflections and future
directions for counseling psychology. Journal of Counseling
Psychology, 57(1), pp. 11–22.
Nel, J. A., Valchev, V. H., Rothmann, S., van de Vijver, F.J.R.,
Meiring, D. & de Bruin, G.P. (2012). Exploring the personality
structure in the 11 languages of South Africa. Journal of
Personality, 80, pp. 915–948. doi: 10.1111/j.1467-
6494.2011.00751.x
Nell, V. (1994). Interpretation and misinterpretation of the South
African Wechsler-Bellevue Adult Intelligence Scale: a history and a
prospectus. South African Journal for Psychology, 24(2), pp. 100–
109.
Nell, V. (1997). Science and politics meet at last: The insurance
industry and neuropsychological norms. SA Journal of Psychology,
27, pp. 43–49.
Nell, V. (1999). Cross-cultural neuropsychological assessment:
Theory and practice. New Jersey: Lawrence Erlbaum.
Nell, V. (2000). Cross-cultural neuropsychological assessment:
Theory and practice. New Jersey: Erlbaum.
Nell, R.D. (2005). Stress, coping resources and adjustment of married
mothers in the teaching profession. Unpublished master’s treatise,
Nelson Mandela Metropolitan University.
Newborg, J., Stock, J.R., Wnek, L., Guidubaldi, J., & Svinicki, J.
(1984). The Battelle Developmental Inventory. Allen, TX: DLM
Teaching Resources.
Nicholas, L. (2000). An evaluation of psychological reports considered
in the amnesty process of the truth and Reconciliation
Commission. SA Journal of Psychology, 30(1), pp. 50–52.
Nicholls, M., Viviers, A.M., & Visser, D. (2009). Validation of a test
battery for the selection of call centre operators in a
communications company. SA Journal of Psychology, 39(1), pp.
19–31.
Nolte, K., Steyn, B. J., Krüger, P. E., & Fletcher, L. (2016).
Mindfulness, psychological well-being and doping in talented
young high-school athletes. South African Journal for Research in
Sport, Physical Education and Recreation, 38(2), pp. 153−165.
Nolte, P.L., Wessels, J.C., & Prinsloo, C.E. (2009). The effect of
therapeutic recreation activities on students’ appraisals of their
emotions during academically demanding times. Journal for New
Generation Sciences, 7(1), pp. 160–175.
Nsamenang, A.B. (2007). Origins and development of scientific
psychology in Afrique Noire. In M.J. Stevens and D. Wedding
(Eds., under the supervision of John G. Adair). Psychology:
IUPsyS global resource (2007 ed.). London: Psychology Press.
Available at
https://www.unige.ch/fapse/SSE/teachers/dasen/Nsamenang2007.p
Nunnally, J.C. (1978). Psychometric theory. New York: McGraw-Hill.
Nunnally, J. C., & Bernstein, I.H. (1994). Psychometric theory (3rd
ed.). New York: McGraw Hill.
Nuttall, E.V., Romero, I., & Kalesnik, J. (Eds) (1992). Assessing and
screening preschoolers: Psychological and educational
dimensions. United States of America: Allyn and Bacon.
Nwoye, A. (2015). What is African psychology the psychology of?
Theory & Psychology, 25(1), pp. 96–116.
Nzimande, B. (1995). Culture fair testing? To test or not to test? Paper
delivered at the Congress on Psychometrics for Psychologists and
Personnel Practitioners, Pretoria, 5–6 June 1995.

O’Boyle, E.H., Humphrey, R.H., Pollack, J.M., Hawver, T.H., & Story,
P.A. (2011). The relation between emotional intelligence and job
performance: A meta-analysis. Journal of Organizational Behavior,
32, pp. 788–818.
Odendaal, I.E., Brink, M. & Theron, L.C. (2011). Rethinking
Rorschach interpretation: an exploration of resilient black South
African adolescents’ personal constructions. SA Journal of
Psychology, 41(4), pp. 528–539.
OECD (2011). PISA 2009 results: Students on line: Digital
technologies and performance (Volume VI).
http://dx.doi.org/10.1787/9789264112995-en
Olivier, N.M. & Swart, D.J. (1974). Manual for the Aptitude Tests for
School Beginners (ASB). Pretoria: HSRC.
Ones, D. (2008). Recent Assessment Centre research: A recap.
Keynote address presented at the 28th Annual Conference of the
AC study Group, 12–14 March, Protea Hotel, Stellenbosch.
Oosthuizen, R.M. & Koortzen, P. (2007). An empirical investigation of
job and family stressors amongst firefighters in the South African
context. SA Journal of Industrial Psychology, 33(1), pp. 49–58.
Osborn, D.S., & Zunker, V.G. (2016). Using assessment results for
career development (9th ed.). Boston, MA: Cengage Learning.
Outhoff, K. (2010). The pharmacology of anxiolytics. South African
Family Practice, 52(2), pp. 99−105. doi: 10.1080/20786204.2010.
10873947
Owen, K. (1986). Toets en itemsydigheid: Toepassing van die Senior
Aanlegtoetse, Meganiese Insigtoets en Skolastiese Bekwaamheid
Battery op Blanke, swart en kleurling en Indier Technikon
studente. Pretoria: Human Science Research Council.
Owen, K. (1989a). Test and item bias: The suitability of the Junior
Aptitude tests as a common test battery for white, Indian and black
pupils in Standard 7. Pretoria: HSRC.
Owen, K. (1989b). The suitability of Raven’s Progressive Matrices for
various groups in South Africa. Personality and Individual
Differences, 13, pp. 149–159.
Owen, K. (1991). Test bias: the validity of the Junior Aptitude Tests
(JAT) for various population groups in South Africa regarding the
constructs measured. SA Journal of Psychology, 21(2), pp. 112–
118.
Owen, K. (1992a). Test-item bias: Methods, findings and
recommendations. Pretoria: HSRC.
Owen, K. (1992b). The suitability of Raven’s Standard Progressive
Matrices for various groups in South Africa. Personality and
Individual Differences, 13, pp. 149–159.
Owen, K. (1996). Aptitude tests. In K. Owen & J.J. Taljaard (Eds.).
Handbook for the use of psychological and scholastic tests of the
HSRC (pp. 191–270). Pretoria: HSRC.
Owen, K. (1998). The role of psychological tests in education in South
Africa: Issues, controversies and benefits. Pretoria: HSRC.
Owen, K. & Taljaard, J.J. (1995) Handleiding vir die gebruik van
sielkundige en skolastiese toetse van die RGN. Pretoria: HSRC.
Owen, K. & Taljaard, J.J. (Eds) (1996). Handbook for the use of
psychological and scholastic tests of the HSRC (revised edition).
Pretoria: HSRC.
Ozminkowski, R. J., Dunn, R. L., Goetzel, R. Z., Cantor, R. I.,
Murnane, J., & Harrison, M. (1999). A return on investment
evaluation of the Citibank, NA, Health Management Program.
American Journal of Health Promotion, 14(1), pp. 31−43.
Pade, H. (2014). The evolution of psychological testing: Embarking on
the age of digital assessment. In R. Valle & J. Klimo (Eds.), The
changing faces of therapy: Innovative perspectives on clinical
issues and assessment. Available at: http://www.helloq.com
Park, G., Schwartzs, H.A., Eichstaedt, J.C., Kern, M.L., Kosinski, M.,
Stillwell, D.J., Ungar, L.H., & Seligman, M.E. (2015, June).
Automatic personality assessment through social media language.
Journal of Personality and Social Psychology, 108(6), pp.
934−952.
Parshall, C. G., Harmes, J.C., Davey, T., & Pashley, P. (2010).
Innovative items in for computerised testing. In W.J. van der
Linden & C.A.W. (Eds.). Computerised adaptive testing: Theory
and practice (2nd ed.). Norwell, MA: Kluwer Academic Publishers.
Parsons, F. (1909). Choosing a vocation. New York: Houghton Mifflin.
Pasche, S., & Myers, B. (2012). Substance misuse trends in South
Africa. Human Psychopharmacology: Clinical and Experimental,
27(3), pp. 338−341. doi: 10.1002/hup.2228
Paterson, H., & Uys, K. (2005). Critical Issues in Psychological Test
Use in the South African Workplace. SA Journal of Industrial
Psychology, 31(3), pp. 12–22.
Patrick, J. (2000). Mental status examination rapid record form.
Retrieved on 20 January 2009 from
www.nev.dgp.org.au/files/programsupport/mentalhealth/Mental%20
%20form.pdf?
Patton, W. & McMahon, M. (1999). Career development and systems
theory: A new relationship. Pacific Grove, CA: Brooks/Cole.
Patton, W., McMahon, M., & Watson, M.B. (2017). Career
development and systems theory: Enhancing our understanding of
careert. In G.B. Stead & M.B. Watson (Eds.), Career psychology in
the South African context (pp. 85−104). Pretoria, South Africa: Van
Schaik.

Paunonen, S.V. & Ashton, M.C. (1998). The structured assessment of


personality across cultures. Journal of Cross-Cultural Psychology,
29, pp. 150–170.
Pervin, L.A. (1983). The stasis and flow of behavior: Toward a theory
of goals. Nebraska Symposium on Motivation, pp. 1–53.
Pervin, L.A. (Ed.). (1989). Goal concepts in personality and social
psychology. Hillsdale, NJ: Erlbaum.
Petersen, I, Louw, J., & Dumont, K. (2009). Adjustment to university
and academic performance among disadvantaged students in
South Africa. Journal of Educational Psychology, 29, pp. 99–115.
Petkoon, L. & Roodt, G. (2004). The discriminant validity of the culture
assessment instrument: A comparison of company sub-cultures.
SA Journal of Industrial Psychology, 30(1), pp. 74–82.
Petrill, S.A., Pike, A., Price, T. & Plomin, R. (2004). Chaos in the
home and socioeconomic status are associated with cognitive
development in early childhood: Environmental mediators
identified in a genetic design. Intelligence, 32, pp. 445–460.
Piaget, J. & Inhelder, B. (1971). Mental imagery in the child
(Trans.P.A. Chilton). London: Routledge & Kegan Paul.
Piedmont, R.L. Bain, E. McCrae, R.R. & Costa, P.T. (2002). The
applicability of the five-factor model in a sub-Saharan culture: The
NEO-PI-R in Shona. In R.R. McCrae & J. Allik (Eds.), The five-
factor model of personality across cultures, pp. 155–173. New
York: Kluwer Academic/Plenum.
Pienaar, I, Shuttleworth-Edwards, A.B., Klopper, C.C., & Radloff, S.
(2017). Wechsler Adult Intelligence Scale – Fourth Edition
preliminary normative guidelines for educationally disadvantaged
Xhosa-speaking individuals. South African Journal of Psychology,
47(2), pp. 159−170.
Pieterse, A.M. (2005). The relationship between time perspective and
career maturity for Grade 11 and 12 learners. Unpublished
master’s thesis, University of the Free State, Bloemfontein, South
Africa.
Pietersen, H.J. (1992). Die waarde van die takseersentrum: ’n repliek
[The value of the taxation centre: A reply]. Tydskrif vir
Bedryfsielkunde, 18(1), pp. 20–22.
Pillay, J. (2011). Challenges counsellors face while practising on
South African schools. SA Journal of Psychology, 41(3), pp. 351–
362.
Pillay, A.L., & Johnston, E.R., (2011). Intern clinical psychologists’
experiencees of their training and internship placements. SA
Journal of Psychology, 41(1), pp. 74–82.
Piotrowski, C., & Keller, J.W. (1989). Psychological testing in
outpatient mental health facilities: A national study. Professional
Psychology: Research and Practice, 20, pp. 423–425.
Potgieter, A., & Roodt, G. (2004). Measuring a customer intimacy
culture in a value discipline context. SA Journal of Resource
Management, 2(3), pp. 25–31.
Plomin, R., & Petrill, S.A. (1997). Genetics and intelligence: What’s
new? Intelligence, 24(1), pp. 53–77.
Plug, C. (1996). An evaluation of psychometric test construction and
psychological services supported by the HSRC. Unpublished
manuscript, University of South Africa.
Poortinga, Y.H., & Klieme, E. (2016). The history and current status of
testing across cultures and countries. In F.T. Leong, D. Bartram,
F.M. Cheung, K.F. Geisinger, & D. Iliescu (Eds.)., The ITC
international handbook of testing and assessment (pp. 14−34).
New York, NY: Oxford University Press.
Popp, S., Ellison, J., & Reeves, P. (1994). Neuropsychiatry of the
anxiety disorders. In J. Ellison, C. Weinstein & T. Hodel-Malinofsky
(Eds), The psycho-therapist’s guide to neuropsychiatry (pp. 369–
408). Washington, DC: American Psychiatric Press.
Popp, E. C., Tuzinsky, K., & Fetzer, M. (2016). Actor or avatar?
Considerations in selecting appropriate formats for assessment
content. In F. Drasgow (Ed.), Technology and testing: Improving
educational and psychology measurement (pp. 78−102). New
York: Routledge.
Poropat, A.E. (2009). A meta-analysis of the five-factor model of
personality and academic performance. Psychological Bulletin,
135(2), pp. 322–338.
Porteus, S.D. (1937). Primitive intelligence and environment. New
York: Macmillan.
Potter, C., Kaufman, W., Delacour, J., Mokone, M., Van der Merwe, E.
& Fridjhon, P. (2009). Three-dimensional spatial perception and
academic performance in engineering graphics: a longitudinal
investigation. SA Journal of Psychology, 39(1), pp. 109–121.
Powers, D.E., & O’Neill, K. (1993). Inexperienced and anxious
computer users: Coping with a computer administered test of
academic skills. Educational Assessment, 1(2), pp. 153–173.
Preamble to the Constitution of the World Health Organization as
adopted by the International Health Conference, New York, 19–22
June, 1946; signed on 22 July 1946 by the representatives of 61
States (Official Records of the World Health Organization, no. 2, p.
100) and entered into force on 7 April 1948.
Prediger, D.J. (1974). The role of assessment in career guidance. In
E.L. Herr (Ed.), Vocational guidance and human development.
Boston: Houghton Mifflin.
Preston, P. (2005). Testing children: A practitioner’s guide to the
assessment of mental development in infants and young children.
Germany: Hogrefe & Huber Publishers
Price, L.R. (2017). Psychometric methods: Theory into practice. New
York: Guilford Press.
Prince, J.P., & Heiser, L.J. (2000). Essentials of career interest
assessment. New York: John Wiley.
Professional Board for Psychology (2010a). Form 94. Training and
examination guidelines for psychometrists. Pretoria: Health
Professions Council of South Africa. Retrieved on 1 December
2012 from
http://www.hpcsa.co.za/downloads/psycho_policy/form_94.pdf
Professional Board for Psychology (2010b). Form 208. Policy on the
classification of psychometric measuring devices, instruments,
methods and techniques. Pretoria: Health Professions Council of
South Africa. Retrieved on 1 December 2012 from
http://www.hpcsa.co.za/downloads/registration_forms/2010_registra
01_june_2010_exco.pdf
Professional Board for Psychology (2010c). Form 207. List of tests
classified as psychological tests. Pretoria: Health Professions
Council of South Africa. Retrieved on 1 December 2012 from
http://www.hpcsa.co.za/downloads/psychology/psychom_form_207
Professional Board for Psychology (2010d). Guidelines for reviewers
for the classification of psychometric tests, measuring devices,
instruments, methods and techniques in South Africa. Pretoria:
Health Professions Council of South Africa.
Psytech International. (2000). 15FQ+ Technical Manual. Pulloxhill,
Bedfordshire: Psychometrics Limited.
Psytech SA (2017). Psytech South Africa catalogue: products,
services and training offered by Psytech South Africa.
Johannesburg: Psytech SA.
Putka, D. (2017). Reliability. In J. Farr, & N. Tippins (Eds.), Handbook
of employee selection (2nd ed.) (pp. 3−33). Mahwah, NJ:
Lawrence Erlbaum.
Quinn, R.E. & Rohrbauch, J. (1981). A Competing Values Approach to
Organisational Effectiveness. Public Productivity Review, 5, pp.
122–140.
Rabie, J. (2005). Standaardisering van die Meyer-
belangstellingsvraelys op volwassenes [Standardization of the
Meyer Interest Questionnaire on adults]. Unpublished master’s
thesis, University of Stellenbosch, Stellenbosch, South Africa.
Radecki, L., Sand-Loud, N., O’Connor, K.G., Sharp, S., & Olson, L.M.
(2011). Trends in the use of standardized tools for developmental
screening in early childhood: 2002−2009. Pediatrics, 128, pp.
14−19.
Radloff, L.S. (1977). The CES-D scale: A self-report depression scale
for research in the general population. Applied Psychological
Measurement. 1(3), pp. 385−401. doi:
10.1177/014662167700100306
Rahim, M.A. (Ed.) (1989). Managing conflict: An integrative approach.
New York: Praeger.
Raiford, S.E., Weiss, L.G., Rolfhus, E., & Coalson, D. (2005/2008).
Wechsler Intelligence Scale for Children – Fourth Edition (WISC-
IV): Technical report #4. Pearson Education. Retrieved on 13
September 2018 from
http://images.pearsonclinical.com/images/assets/WISC-
IV/80720_WISCIV_Hr_r4.pdf
Ramesar, S., Koortzen, P., & Oosthuizen, R.M. (2009). The
relationship between emotional intelligence and stress
management. SA Journal of Industrial Psychology, 35(1), pp. 1–
10.

Ramsay, L. J., Taylor, N., De Bruin, G. P., & Meiring, D. (2008). The
Big Five personality factors at work: A South African validation
study. In J. Deller (Ed.), Research contributions to personality at
work. Munich & Meiring, Germany: Rainer Hampp Verlag.
Ratele, K. (2004). About black psychology. In D. Hook, N. Mkhize, P.
Kiguwa, A. Collins, E. Burman & I. Parker (Eds.), Critical
Psychology (pp. 389−414). Lansdowne, South Africa: University of
Cape Town Press.
Ratele, K. (2017). Four (African) psychologies. Theory & Psychology,
27(3), pp. 313–327.
Raven, J.C. (1965). Progressive matrices. London: H.K. Lewis.
Raykov, T., Marcoulides, G. A., & Li, C.H. (2012). Measurement
invariance for latent constructs in multiple populations: A critical
view and refocus. Educational and Psychological Measurement,
72(6), pp. 954–974.
Reckase, M.D. (1988). Computerized adaptive testing: a good idea
waiting for the right technology. Paper presented at the meeting of
the American Educational Research Association, New Orleans.
Reeve, C.L. & Lam, H. (2007). The relation between practice effects,
test-taker characteristics and degree of g-saturation. International
Journal of Testing, 7(2), pp. 225–242
Reger, M., Welsh, R., Razani, J., Martin, D., & Boone, K. (2002). A
meta-analysis of the neuropsychological sequelae in HIV infection.
Journal of the International Neuropsychological Society, 8(3), pp.
410–424.
Reis, H.T., Sheldon, K.M., Gable, S,L., Roscoe, J. & Ryan, R.M.
(2000). Daily well-being: the role of autonomy, competence, and
relatedness. Personality and Social Psychology Bulletin, 26(4), pp.
419–435.
Republic of South Africa (1998). Employment Equity Act. Government
Gazette, 400 (19370), Cape Town, 19 October.
Republic of South Africa (2000). Promotion of Equality and Prevention
of Unfair Discrimination Act. Government Gazette, 416 (20876),
Cape Town, 9 February.
Republic of South Africa (2013). Employment Equity Amendment Act
No. 47 of 2013. Government Gazette, 583 (37238), Cape Town, 16
January 2014. Available at:
http://www.labour.gov.za/DOL/downloads/legislation/acts/employme
equity/eea_amend2014.pdf
Retief, A.I. (1987). Thematic apperception testing across cultures:
Tests of selection versus tests of inclusion. SA Journal of
Psychology, 17, pp. 47–55.
Retief, A. (1992). The cross-cultural utility of the SAPQ: Bias or fruitful
difference? SA Journal of Psychology, 22, pp. 202–207.
Reuning, H. & Wortley, W. (1973). Psychological studies of the
Bushmen. Psychologia Africana, Monograph Supplement 7(113),
pp.1−113.
Reynolds, C.R. & Hickman, J.A. (2004). Draw-a-Person Intellectual
Ability Test for children, adolescents, and adults (DAP: IQ). Los
Angeles: Western Psychological Services.
Richards, S.D., Pillay, J. & Fritz, E. (2012). The use of sand-tray
techniques by school counsellors to assist children with emotional
and behavioural problems. The Arts in Psychotherapy, 39, pp.
367−373.
Richardson, J., Martin, N., Danley, K., Cohen, M., Carson, V., Sinclair,
B., Racenstein, J., Reed, R., & Levine, A. (2002).
Neuropsychological functioning in a cohort of HIV infected women:
Importance of antiretroviral therapy. Journal of the International
Neuropsychological Society, 8(6), pp. 781–793.
Richter, L.M. (1989). Household density, family size and the growth
and development of black children: a cross-sectional study from
infancy to middle childhood. SA Journal of Psychology, 19(4), pp.
191–198.
Richter, L.M. & Griesel, R.D. (1988). Bayley Scales of Infant
Development: Norms for Interpreting the performance of black
South African infants. Pretoria: Institute of Behavioural Sciences.
Richter, L.M. & Grieve, K.W. (1991). Cognitive development in young
children from impoverished African families. Infant Mental Health
Journal, 12(2), pp. 88–102.
Richter, L., Mabaso, M., & Hsiao, C. (2016). Predictive power of
psychometric assessments to identify young learners in need of
early intervention: Data from the Birth to Twenty Plus (Bt20+)
cohort, South Africa. South African Journal of Psychology, 46(2),
pp. 175−190.
Ring, L., Hofer, S., McGee, H., Hickey, A. & O’Boyle, C.A. (2007).
Individual quality of life: can it be accounted for by psychological or
subjective well-being? Social Indicator Researcher, 86, pp. 443–
461.
Robertson, G.J. & Eyde, L.D. (1993). Improving test use in the United
States: The development of an interdisciplinary casebook.
European Journal of Psychological Assessment, 9(2), pp. 137–
146.
Robinson, M. (1986). ’n Ondersoek na die waarde van die Junior
Suid-Afrikaanse Indivduele Skale (JSAIS) as diagnostiese
hulpmiddel vir leergestremde kinders met leesprobleme
[Investigating the value of the Junior South African Indivdual
Scales (JSAIS) as diagnostic aid for learning-disabled children with
reading problems]. Unpublished master’s disertation, University of
Pretoria.
Robinson, M. (1989). Bylae tot die Handleiding vir die Junior Suid-
Afrikaanse Individuale Skale (JSAIS): Tegniese besonderhede en
normtabelle vir 6- tot 8-jarige leerlinge [Appendix to the Manual for
the Junior South African Individual Scales (JSAIS): Technical
details and tables of norms for the 6- to 8-year-old pupils]. Pretoria:
HSRC.
Robinson, M. (1994). Manual for the Individual Scale for General
Scholastic Aptitude (ISGSA). Part III: Norm tables. Pretoria: HSRC.
Robinson, M. (1996). Manual for the Individual Scale for General
Scholastic Aptitude (ISGSA). Part I: Background and
standardization. Pretoria: HSRC.
Robinson, M. & Hanekom, J.M.D. (1991). Die JSAIS en ASB as
hulpmiddels by die evaluaring van skoolgereedheid [ The use of
the JSAIS and ASB for the evaluation of school readiness].
Pretoria: HSRC.
Robison, J. (2010). The business case for well-being. Business
Journal, June 2010. Available at
http://news.gallup.com/businessjournal/139373/business-case-
wellbeing.aspx
Roe, R.A. (2008). The impact of testing on people and society. Paper
presented at the 6th Conference of the International Test
Commission, Liverpool, UK, 14–16 July 2008. (Can be accessed
online at www.intestcom.org).
Roger, E.A.S. (1989). The Number and Quantity Concepts subtest of
the Junior South African Individual Scale as predictor of early
mathematical achievement. Unpublished master’s dissertation,
University of Port Elizabeth.
Rogers, T.B. (1997). Teaching ethics and test standards in a
psychological course: A Test Taker’s Bill of Rights. Teaching of
Psychology, 24(1), pp. 41–46.
Roivainen, E. (2012). Economic, educational and IQ gains in eastern
Germany 1990–2006. Intelligence, 40, pp. 571–575.
Roodt, G. (1991). Die graad van werkbetrokkenheid as voorspeller
van persoonlike welsyn: ’n studie van bestuurders [ The degree of
job involvement as predictor of personal well-being: A study of
managers]. Unpublished doctoral dissertation, University of the
Free State.
Roodt, G. (2007). The reliability of structured interviews: Competency
rating interviews as a case in point. Paper presented at the 27th
Annual Conference of the AC study Group, 14–16 March, Protea
Hotel, Stellenbosch.
Roodt, G. (2008). Reliability and validity analyses. In S. Schlebusch &
G. Roodt (Eds.). Assessment Centres: Unlocking potential for
growth. Randburg: KnowRes Publishers.
Roos, V. (2008). The Mmogo-method®: Discovering symbolic
community interactions. Journal of Psychology in Africa, 18(4), pp.
659−668.
Roos, V. (2012). The Mmogo-method®: An exploration of experiences
through visual projections. Qualitative Research in Psychology,
9(3), pp. 249−261.
Rosander, P., Bäckström, M., & Stenberg, G. (2011). Personality traits
and general intelligence as predictors of academic performance: A
structural equation modelling approach. Learning and Individual
Differences, 21(5), pp. 590–596.
Rosen, S., Simon, J.L., Thea, D.M. & Vincent, J.R. (2000). Care and
treatment to extend the working lives of HIV-positive employees:
Calculating the benefits to business. South African Journal of
Science, 96(6), pp. 300–304.
Rosen, S., Simon, J.L., Vincent, J.R., MacLeod, W., Fox, M., & Thea,
D.M. (2003). AIDS is your business. Harvard Business Review,
February, pp. 81–87.
Rosen, S., Vincent, J.R., MacLeod, W., Fox, M., Thea, D.M. & Simon,
J.L. (2004). The cost of HIV/Aids to businesses in southern Africa.
Official Journal of the International AIDS Society, 18(2), pp. 317–
324.
Rosenthal, R. & Rosnow, R.L. (1991). Essentials of behavioral
research: Method and data analysis (2nd edition). New York:
McGraw-Hill.
Ross, S., Krukowski, R., Putnam, S., & Adams, K. (2003). Memory
assessment scales in detecting probable malingering in mild head
injury. Paper presented at the 26th Annual International
Neuropsychological Society Mid-year Conference, Berlin,
Germany.
Roth, M., McCaul, E., & Barnes, K. (1993). Who becomes an ‘at-risk’
student? The predictive value of a kindergarten screening battery.
Exceptional Children, 59(4), pp. 348–358.
Runyon, R.P. & Haber, A. (1980). Fundamentals of behavioral
statistics. Reading, MA: Addison- Wesley Publishing Company.
Ryan Krane, N.E., & Tirre, W.C. (2005). Ability assessment in career
counseling. In S.D. Brown & R.W. Lent (Eds.), Career
development and counseling: Putting theory and research to work
(pp. 330−352). Hoboken, NJ US: John Wiley & Sons Inc.
Ryff, C.D. (1989). Happiness is everything, or is it?: Explorations on
the meaning of psychological wellbeing. Journal of Personality and
Social Psychology, 57, pp. 1069−1081. doi:10.1037//0022-
3514.57.6.1069
Ryff, C.D. & Keys, C.L.M. (1995). The structure of psychological well-
being revisited. Journal of Personality and Social Psychology,
69(4), pp. 719–727.
Ryff, C.D. & Singer, B. (2003). Flourishing under fire: resilience as a
prototype of challenged thriving. In C.L.M. Keyes & J. Haidt.
Flourishing Positive Psychology and the life well-lived.
Washington: American Psychological Association.
SA: 1 in 3 mentally ill. (2007, October). Retrieved September 27,
2008, from
http://www.health24.com/mind/Mental_health_in_SA/1284-
1031,42638.asp
Salovey, P. & Mayer, J.D. (1990). Emotional Intelligence. Imagination,
Cognition and Personality, 9(3), pp. 185–211.
Salgado, J.F. (2017). Personnel selection. Oxford Research
Encyclopedias, Psychology. Oxford University Press: USA.
Retrieved on 13 September 2018 from:
http://dx.doi.org/doi:10.1093/acrefore/9780190236557.013.8
Salgado, J. F., Anderson, N., Moscoso, S., Bertua, C., De Fruyt, F., &
Rolland, J.P. (2003). A meta-analytic study of general mental
ability validity for different occupation in the European community.
Journal of Applied Psychology, 88(6), pp. 1068–1081.
Samelson, F. (1979). Putting psychology on the map: Ideology and
intelligence testing. In A.R. Buss (Ed.), Psychology in social
context. New York: Irvington.
Sampson, J.P. (2009). Modern and postmodern career theories: The
unnecessary divorce. The Career Development Quarterly, 58, pp.
91–96.
Sanderson, K. & Andrews, G. (2006). Common mental disorders in
the workforce: recent findings from descriptive and social
epidemiology. Canadian Journal of Psychiatry, 51(2), pp. 63–76.
Sarason, I.G. (Ed.). (1961). Test anxiety and the intellectual
performance of college students. Journal of Educational
Psychology, 52, pp. 201–206.
Sattler, J.M. (1988). Assessment of Children. (3rd ed.). San Diego,
CA: Author.
Savickas, M.L. (1989). Career-style assessment and counseling. In T.
Sweeney (Ed.), Adlerian counseling: A practical approach for a
new decade (pp. 289−320). Muncie, IN: Accelerated Development
Press.
Savickas, M.L. (2005). The theory and practice of career construction.
In S. Brown, & R. Lent (Eds.), Career development and
counseling: Putting theory and research to work (pp. 42–70).
Hoboken, NJ: John Wiley.
Savickas, M.L. (2010). Foreword: Best practices in career intervention.
In K. Maree (Ed.), Career counselling: Methods that work (pp. xi–
xii). Cape Town, South Africa: Juta.
Savickas, M.L. (2011). Prologue: Reshaping the story of career
counselling. In J.G. Maree (Ed.), Shaping the story: A guide to
facilitating narrative counselling (pp. 1–3). Rotterdam: Sense.
Savickas, M. L., Nota, L., Rossier, J., Dauwalder, J., Duarte, M. E.,
Guichard, J. et al. (2009). Life designing: A paradigm for career
construction in the 21st century. Journal of Vocational Behavior,
75(3), pp. 239–250.
Scalise, K., & Gifford, B. (2006). Computer-based assessment in e-
learning: A framework for constructing ‘intermediate constraint’
questions and tasks for technology platforms. Journal of
Technology, Learning and Assessment, 4(6). Retrieved on 21
January 2018 from http://www.jtla.org.
Schaap, P. (2003). The construct comparability of the PIB/SPEEX
stress index for job applicants from diverse cultural groups in
South Africa. SA Journal of Psychology, 33(2), pp. 95–102.
Schaap, P. (2011). The differential Item functioning and structural
equivalence of a nonverbal cognitive ability test for five language
groups. SA Journal of Industrial Psychology, 37(1), pp. 1–16.
Schaap, P. & Basson, J.S. (2003). The construct equivalence of the
PIB/SPEEX motivation index for job applicants from diverse
cultural backgrounds. SA Journal of Industrial Psychology, 29(2),
pp. 49–59.
Schaufeli, W. B., & Bakker, A. (2003). Utrecht Work Engagement
Scale: Preliminary manual. Occupational Health Psychology Unit,
Utrecht University.
Schaufeli, W. B., & Bakker, A.B. (2004). Job demands, job resources,
and their relationship with burnout and engagement: A multi-
sample study. Journal of Organizational Behavior, 25, pp. 293–
315.
Schaufeli, W.B., Bakker, A.B., & Van Rhenen, W. (2009). How
changes in job demands and resources predict burnout, work
engagement, and sickness absenteeism. Journal of Organizational
Behavior, 30, pp. 893–917. doi: 10.1002/job.595
Schaufeli, W.B., Martínez, I.M., Pinto, A.M., Salanova, M., & Bakker,
A.B. (2002). Burnout and engagement in university students.
Journal of Cross-Cultural Psychology, 33(5), pp. 464–481.
Schaufeli, W.B., Salanova, M., González-Romá, V., & Bakker, A.B.
(2002). The measurement of engagement and burnout: A two
sample confirmatory factory analytic approach. Journal of
Happiness Studies, 3, pp. 71–92.
Schaufeli, W.B., Taris, T.W., & Van Rhenen, W. (2008). Workaholism,
burnout, and work engagement: Three of a kind or three different
kinds of employee well-being? Applied Psychology: An
International Review, 57(2), pp. 173–203.
Scheier, M. F., Carver, C. S., & Bridges, M.W. (1994). Distinguishing
optimism from neuroticism (and trait anxiety, self-mastery, and
self-esteem): A reevaluation of the Life Orientation Test. Journal of
Personality and Social Psychology, 67, pp. 1063–1078.
Schepers, J.M. (1992). Toetskonstruksie: Teorie en Praktyk [Test
Construction: Theory and Practice]. Johannesburg: RAU Drukpers.
Schepers, J.M. (1999). The Bell-curve revisited: a South African
perspective. Journal of Industrial Psychology, 25(2), pp. 52–63.
Schepers, J.M. (2005). Technical manual: Locus of control inventory
(Revised edition 2003). Johannesburg: Jopie van Rooyen &
Partners.
Schepers, J.M., & Hassett, C.F. (2006). The relationship between the
fourth edition (2003) of the Locus of Control Inventory and the
Sixteen Personality Factor Questionnaire (version 5). SA Journal
of Industrial Psychology, 32, pp. 9–18.
Schlebusch, S. & Roodt, G. (Eds.) (2008). Assessment Centres:
Unlocking potential for growth. Johannesburg: KnowRes
Publishing.
Schmidt, C.S. (1997). Engaging the dilemmas of psychological
testing. Paper presented at the Annual Conference of the Society
of Industrial Psychology, CSIR Conference Centre, Pretoria.
Schmidt, F.E. & Hunter, J.E. (1998). The validity and utility of selection
methods in personnel psychology: Practical and theoretical
implications of 85 years of research findings. Psychological
Bulletin, 124, pp. 262–274.
Schmidt, F.L., Oh, I., & Shaffer, J.A. (2016). The validity and utility of
selection methods in personnel psychology: Practical and
theoretical implications of 100 years of research findings. Fox
School of Business research paper. Retrieved on 13 September
2018 from https://ssrn.com/abstract=2853669.
Schmitt, N. W., Arnold, J. D., & Nieminen, L. (2017). Validation
strategies for primary studies. In J.L. Farr & N.T. Tippins (Eds.),
Handbook of employee selection (pp. 51–71) (2nd ed). New York,
NY, US: Routledge/Taylor & Francis Group.
Schuerger, J.M. (2002). The Sixteen Personality Factor Questionnaire
(16PF). In C.E. Watkins, Jr. & V.L. Campbell (Eds.), Testing and
assessment in counseling practice (2nd ed.), pp. 73–110.
Mahwah, NJ: Lawrence Erlbaum Publishers.
Sechrest, L. (1978). Incremental validity. In D.N. Jackson and S.
Messick (Eds.) Problems in human assessment, Huntington, NY:
Robert E. Krieger Publishing Company.
Seligman, L. (1994). Developmental career counselling and
assessment. Thousand Oaks, CA: Sage.
Seligman, M.E.P (2003a). Foreword: The past and future of positive
psychology. In C.L.M. Keyes & J. Haidt. Flourishing: Positive
Psychology and the life well-lived. Washington: American
Psychological Association.
Seligman, M.E.P. (2003b). Authentic happiness: Using the new
positive psychology to realise your potential for lasting fulfilment.
London: Nicholas Brealey.
September, S.J., Rich, E.G. & Roman, N.V. (2016). The role of
parenting styles and socio-economic status in parents’ knowledge
of child development. Early Child Development and Care, 186(7),
pp. 1060−1078, doi: 10.1080/03004430.2015.1076399
Serious Games Institute (2017). About SGI projects. Retrieved from:
http://www.seriousgamesinstitute.co.uk/applied-
research/About-SGI-Projects.aspx
Shabalala, S. & Jasson, A. (2011). PTSD symptoms in intellectually
disabled victims of sexual assault. SA Journal of Psychology,
41(4), pp. 424–436.
Sharf, R.S. (2006). Applying career development theory to
counselling. London: Thomson.
Sheldon, K.M., Kashdan, T.B., & Steger, M.F. (2011). Designing
positive psychology: Taking stock and moving forward. Oxford:
Oxford University Press.
Sherwood, E.T. (1957). On the designing of TAT pictures, with special
reference to a set for an African people assimilating western
culture. Journal of Social Psychology, 45, pp. 161–190.
Shonkoff, J.P., & Meisels, S.J. (2000). Handbook of early childhood
intervention (2nd ed.). Cambridge: Cambridge University Press.
Shuttleworth-Edwards, A.B. (2012). Guidelines for the use of the
WAIS-IV with WAIS-III cross-cultural normative Indications. SA
Journal of Psychology, 42(3), pp. 399–410.
Shuttleworth-Edwards, A.B. (2017). Countrywide norms declared
obsolete: Best practice alert for IQ testing in a multicultural context.
South African Journal of Psychology, 47(1), pp. 3−6. doi:
http://dx.doi.org/10.1177/0081246316684465
Shuttleworth-Edwards, A.B., Gaylard, E.K. & Radloff, S.E. (2013).
WAIS-III test performance in the South African context: Extension
of a prior cross-cultural normative database. In S. Laher & K.
Cockcroft (Eds.), Psychological assessment in South Africa:
Research and applications (pp. 17−32). Johannesburg: Wits
University Press.
Shuttleworth-Edwards, A., Kemp, R., Rust, A., Muirhead, J., Hartman,
N., & Radloff, S. (2004) Vanishing ethnic differences in the wake of
acculturation: A review and preliminary indications on WAIS-III test
performance in South Africa. Journal of Clinical and Experimental
Neuropsychology, 26(7), pp. 903–920.
Shuttleworth-Edwards, A.B., Van der Merwe, A.S., Van Tonder, P. &
Radloff, S.E. (2013). WISC-IV test performance in the South African
context: A collation of cross-cultural norms. In S. Laher & K.
Cockcroft (Eds.), Psychological assessment in South Africa:
Research and applications (pp. 33−47). Johannesburg: Wits Press.
Shuttleworth-Edwards, A. B., Whitefield- Alexander, V. J., Radloff, S.
E., Taylor, A. M., & Lovell, M.R. (2009). Computerized
neuropsychological profiles of South African versus United States
athletes: A basis for commentary on cross-cultural norming issues
in the sports concussion arena. The Physician and Sports
Medicine, 37, pp. 45–52.
Shuttleworth-Jordan, A.B. (1996). On not reinventing the wheel: a
clinical perspective on culturally relevant test usage. SA Journal of
Psychology, 26(2), pp. 96–102.
Sieberhagen, C., Pienaar, J., & Els, C. (2011). Management of
employee wellness in South Africa: Employer, service provider and
union perspectives. SA Journal of Human Resource
Management/SA Tydskrif vir Menslikehulpbronbestuur, 9(1), a305.
doi: 10.4102/sajhrm.v9i1.305
Simner, M.L. & Goffin, R.D. (2003). A position statement by the
International Graphonomics Society on the use of graphology in
personnel selection testing. International Journal of Testing, 3(4),
pp. 353–364.
Sinclair, J. (2012, February). Mercer 2011 national survey of
employer-sponsored health plans. Slide presentation at the
meeting of Mercer, Columbus: Ohio Program.
Sireci, S.G. (2011). Evaluating test and survey items for bias across
languages and cultures. In D. Matsumoto & F.J.R. van de Vijver,
Cross-cultural research methods in psychology (pp. 216–240).
Cambridge, NY: Cambridge University Press.
Sireci, S. & Gandara, F. (2016). Testing in educational and
developmental settings. In F.T. Leong, D. Bartram, F.M. Cheung,
K.F. Geisinger, & D Iliescu. (Eds.), The ITC international handbook
of testing and assessment (pp. 188–202). New York, NY: Oxford
University Press.
Skuy, M., Zolezzi, S., Mentis, M., Fridjhon, P., & Cockcroft, K. (1996).
Selection of advantaged and disadvantaged South African
students for university admission. South African Journal of Higher
Education, 10(1), pp. 110–117.
Skuy, M., Schutte, E., Fridjhon, P.& O’Carroll, S. (2001). Suitability of
published neuropsycho-logical test norms for urban African
secondary school students in South Africa. Personality and
Individual Differences, 30(8), pp. 1413-1425.
Smit, G.J. (1996). Psychometrics: Aspects of measurement. Pretoria:
Kagiso.
Smith, P.B. (2004). Acquiescent response bias as an aspect of
cultural communication style. Journal of Cross-Cultural
Psychology, 35, pp. 50–61.doi: 10.1177% 2F0022022103260380
Smith, S.A. (2002). An evaluation of response scale formats of the
Culture Assessment Instrument. Unpublished D Phil thesis,
Johannesburg: Rand Afrikaans University.
Smith, S.A. & Roodt, G. (2003). An evaluation of response scale
formats of the Culture Assessment Instrument. SA Journal of
Human Resource Management, 1(2), pp. 60–75.
Smith, Z. (2016). Construct validation of the DASS-21 in a non-clinical
sample of working adults. Unpublished master’s thesis, University
of Johannesburg, Johannesburg
Smith, Z., Henn, C. & Hill, C. (2018). Validation of the DASS-21 in a
non-clinical sample of working adults. Manuscript submitted for
publication.
Smoski, W.J., Brunt, M.A., & Tannahill, J.C. (1998). C.H.A.P.S.
Children auditory performance scale. Instruction manual. Tampa,
FL: The Educational Audiology Association.
Snow, E. L., Likens, A. D., & McNamara, D.S. (2016, December).
Taking control: Stealth assessment of deterministic behaviours
within a game-based system. International Journal of Artificial
Intelligence in Education, 26(4), pp. 1011–1032.
SIOPSA: Society for Industrial and Organizational Psychology, Inc.
(2003). Principles for the validation and use of personnel selection
procedures (4th ed.). Bowling Green, OH: Author.
SIOPSA: Society for Industrial and Organisational Psychology in
South Africa (2006a). Code of practice for psychological and other
similar assessment in the workplace. Pretoria: Author (in
association with the People Assessment Initiative (PAI)).
Society for Industrial and Organisational Psychology in South Africa
(2006b). Guidelines for the Validation of Assessment Procedures.
Pretoria: Author (in association with the People Assessment
Initiative (PAI)).
Solano-Flores, G., Turnbull, E. & Nelson-Barber, S. (2002).
Concurrent development of dual language assessments: An
alternative to translating tests for linguistic minorities. International
Journal of testing, 2(2), pp. 107–129.
Sommer, M. & Dumont, K. (2011). Psychosocial factors predicting
academic performance of students at a historically disadvantaged
university. SA Journal of Psychology, 41(3), pp. 386–395.
Spangler, W.D. (1992). Validity of questionnaire and TAT measures of
need for achievement: Two meta-analyses. Psychological Bulletin,
112, pp. 140–154.
Sparrow, S.S., Cicchetti, D.V., & Balla, D.A. (2005). Vineland-II
Adaptive Behavior Scales (2nd edition). Minnesota: AGS
Publishing.
Sparrow, S.S., Cicchetti, D.V. & Saulnier, C.A. (2016). Vineland
Adaptive Behavior Scales, Third Edition (Vineland-3). San Antonio,
Texas: Pearson.
Spector, P.E. (1988). Development of the work locus of control scale.
Journal of Occupational Psychology, 61, pp. 335–340. Spielberger,
C.D., Gorsuch, R.L. & Lushene, R.E. (1970). The State-
Trait Anxiety Inventory: Test Manual for Form X. Palo Alto:
Consulting Psychologists Press.
Spitzer, R. L., Kroenke, K., Williams, J. B., & Löwe, B. (2006). A brief
measure for assessing generalized anxiety disorder: the GAD-7.
Archives of Internal Medicine, 166(10), pp. 1092−1097.
Squires, J., Nickel, R.E., & Eisert, D. (1996). Early detection of
developmental problems: Strategies for monitoring young children
in the practice setting. Developmental and behavioural paediatrics,
17(6), pp. 420–427.
Stark, S., Martin, J. & Chernyshenko, O.S. (2016). Technology and
testing: Developments in education, work and health care. In F.T.
Leong, D. Bartram, F.M. Cheung, K.F. Geisinger, & D. Iliescu
(Eds.), The ITC international handbook of testing and assessment
(pp. 395−407). New York, NY: Oxford University Press.
STATSSA, (2010). HIV prevalence in 2010: Mid-year population
estimates. Retrieved from http://www.statssa.gov.za
StatSoft, Inc. (2012). StatSoft STATISTICA Electronic Statistics
Textbook. Tulsa, OK: StatSoft. WEB:
http://www.statsoft.com/textbook/.
Stead, G.B. & Watson, M.B. (1998). The appropriateness of Super’s
career theory among black South Africans. SA Journal of
Psychology, 28, pp. 40–43.
Stead, G.B., & Watson, M.B. Eds. (2017). Career psychology in the
South African context (3rd ed.). Pretoria: Van Schaik.
Steele, G. & Edwards, D. (2002). The development and validation of
the Xhosa translations of the Beck Depression Inventory, the Beck
Anxiety Inventory, and the Beck Hopelessness Scale. Paper
presented at the 8th annual congress of the Psychological Society
of South Africa (PsySSA), Cape Town, South Africa, 24–27
September 2002.
Steele, G. & Edwards, D. (2008). Development and validation of the
isiXhosa translations of the Beck inventories: 2. Item analysis and
internal consistency. Journal of Psychology in Africa, 18, pp. 217–
226.
Sternberg, R.J. (1984). A contextualist view of the nature of
intelligence. International Journal of Psychology, 19, pp. 307–334.
Sternberg, R.J. (2003). Issues in the theory and measurement of
successful intelligence: A reply to Brody. Intelligence, 31, pp. 331–
337.
Sternberg, R.J. (2012). Toward a triarchic theory of human
intelligence. In J.C. Kaufman and E.L. Grigorenko (Eds), The
essential Sternberg: Essays on intelligence, psychology and
education, pp. 33–70. New York: Springer Publishing Company.
Steyn, C., Howcroft, J.G., & Fouché, J.P. (2011). The psychofortology
of female psychiatric out-patients: living with mood and anxiety
disorders. SA Journal of Psychology, 41(3), 288–299.
Storm, L. & Roodt, G. (2002). Die verband tussen
organisasiesosialisering and organisasieverbondenheid [The
relationship between organisation socialisation and organisational
commitment]. SA Tydskrif vir Bedryfsielkunde, 28(1), pp. 14–21.
Strong, E.K. (1927). Vocational Interest Blank. Stanford, CA: Stanford
University Press.
Stroud, L. (2016). Scale A: Foundations of learning. PowerPoint
presentation on the developments in the Griffiths III on the ARICD
site. Available at https://www.aricd.ac.uk.
Stroud, L., Foxcroft, C., Green, E., Bloomfield, S., Cronje, J., Hurter,
K., Lane, H., Marais, R., Marx, C., McAlinden, P., O’Connell, R.,
Paradice, R., & Venter, D. (2016). Part I: Overview, Development
and Psychometrics Properties. In Griffiths III: Griffiths Scales
Scales of Child Development (3rd Ed.). Hogrefe Ltd. Available at:
https://www.hogrefe.co.uk.
Strümpfer, D.J.W. (1997). Sense of coherence, negative affectivity,
and general health in farm supervisors. Psychological Reports, 80,
pp. 963–966.
Strümpfer, D.J.W. & Wissing, M.P. (1998). Review of the South
African data on the Sense of Coherence Scale as a measure of
fortigenesis and salutogenesis. Paper presented at the 4th Annual
Congress of the Psychological Society of South Africa (PsySSA),
9–11 September 1998, Cape Town, South Africa.
Sue, D, Sue, D & Sue, S. (1997). Understanding abnormal behavior
(5th ed.). Boston: Houghton Mifflin Company.
Sunderaraman, P., Zahodne, L.B., & Manly, J.J. (2016). A
commentary on ‘generally representative is representative of none:
Pitfalls of IQ test standardization in multicultural settings’ by A.B.
Shuttleworth-Edwards. The Clinical Neuropsychologist, 30(7), pp.
999−1005. doi: http://dx.doi.org/10.1080/13854046.2016.1211321
Super, D.E. (1983). Assessment in career guidance: Toward truly
developmental counseling. Personnel and Guidance Journal, 61,
pp. 555–562.
Super, D.E. (1990). A life-span, life-space approach to career
development. In D. Brown & L. Brooks (Eds.), Career choice and
development, pp. 197–261. San Francisco, CA: Jossey-Bass.
Super, D.E. (1994). A life span, life space perspective on
convergence. In M.L. Savickas & R.L. Lent (Eds), Convergence in
career development theories: Implications for science and practice
(pp. 63–74). Palo Alto, CA: CPP Books.
Super, D.E. (1995). Values: Their nature, assessment and practical
use. In D.E. Super, B. Šverko, B., & C.E. Super (Eds) Life roles,
values, and careers (pp. 54–61). San Francisco, CA: Jossey-Bass.
Super, B. Sverko & C.E. Super (Eds.). (1995). Life roles, values, and
careers. San Francisco, CA: Jossey-Bass.
Swart, C., Roodt, G., & Schepers, J.M. (1999). Itemformaat,
differensiële itemskeefheid en die faktorstruktuur van ‘n
selfvoltooiingsvraelys [Item format, differential item skewness and
the factor structure of a self-completion questionnaire]. Tydskrif vir
Bedryfsielkunde, 25(1), pp. 33–43.
Swenson, W.M., Rome, H., Pearson, J., and Brannick, T. (1965). A
totally automated psychological test: Experience in a Medical
Center. Journal of the American Medical Association, 191, pp.
925–927.
Taber, B.J., Hartung, P.J., Briddick, H., Briddick, W.C., & Rehfuss,
M.C. (2011). Career style interview: A contextualized approach to
career counseling. The Career Development Quarterly, 59(3), pp.
274-287. doi: 10.1002/j.2161-0045.2011.tb00069.x
Taljaard-Plaut, J., & Strauss, R. (1998). A generation in transition:
Understanding stress-related disorders in African black university
students. Psychoanalytic Psychotherapy in South Africa, 6(2), pp.
56−64.
Tanzer, N.K. (2005) Developing tests for use in multiple cultures and
languages: A plea for simultaneous development. In Hambleton,
R.K., Merenda, P.F., & Spielberger, C.D. (Eds.) (2005). Adapting
educational and psychological tests for cross-cultural assessment
(Chapter 10, pp. 235–244). Mahwah, NJ: Lawrence Erlbaum
Associates, Inc.
Taylor, C., Jamieson, J., Eignor, D. & Kirsch, I. (1998). The
relationship between computer familiarity and performance on
computer-based TOEFL test tasks. TOEFL Research, 61.
Princeton, NJ: Educational Testing Service.
Taylor, I.A. (2000). The construct comparability of the NEO PI-R
Questionnaire for black and white employees. Unpublished
doctoral thesis, University of the Free State.
Taylor, J.M., & Radford, E.J. (1986). Psychometric testing as an unfair
labour practice. SA Journal of Psychology, 16, pp. 79–96.
Taylor, N. (2004). The construction of a South African five-factor
personality questionnaire. Unpublished master’s dissertation, Rand
Afrikaans University, South Africa.
Taylor, N. (2008). Construct, item, and response bias across cultures
in personality measurement. Unpublished doctoral thesis.
Johannesburg, South Africa: University of Johannesburg.
Taylor, N. (2016). Generally representative is generally representative:
Comment on Shuttleworth-Edwards. The Clinical
Neuropsychologist, 30(7), pp. 1017−1022. Doi:
http://dx.doi.org/10.1080/13854046.2016.1213884.
Taylor, N., & De Beer, J. (2010). The relationship between emotional
intelligence and derailment across generations. Johannesburg,
South Africa: JvR Psychometrics.
Taylor, N. & De Bruin, G.P. (2003). Development of the South African
Five Factor Personality Inventory: Preliminary findings. Paper
presented at the Ninth Congress of the Psychological Society of
South Africa (PsySSA), Johannesburg.
Taylor, N., & De Bruin, G.P. (2006). Basic Traits Inventory: Technical
manual. Johannesburg, South Africa: Jopie van Rooyen &
Partners.
Taylor, N., & De Bruin, G.P. (2017). Basic Traits Inventory: Technical
manual (5th Version/Print). Johannesburg, South Africa: JvR
Psychometrics.
Taylor, N., & Vorster, P. (2015). Minnesota Multiphasic Personality
Inventory™-2: Nuclear power facility psychometric properties.
Johannesburg, South Africa: JvR Psychometrics.
Taylor, T.R. (1987). Test bias: The roles and responsibilities of test
users and test publishers. NIPR Special Report Pers – 424.
Pretoria: HSRC.
Taylor, T.R. (1994). A review of three approaches to cognitive
assesssment, and a proposed integrated approach based on a
unifying theoretical framework. SA Journal of Psychology, 24(4),
pp. 184–193.
Taylor, T.R. & Boeyens, J.C.A. (1991). The comparability of the
scores of blacks and whites on the South African Personality
Questionnaire: an exploratory study. SA Journal of Psychology,
21(2), pp. 1–10.
Tchanturia, K., Anderluh, M., Morris, R., Rabe-Hesketh, S., Collier, D.,
Sanchez, P., & Treasure, J. (2004). Cognitive flexibility in anorexia
nervosa and bulimia nervosa. Journal of the International
Neuropsychological Society, 10(4), pp. 513–520.
Teig, N., & Scherer, R. (2016, July 19). Bringing formal and informal
reasoning together: A new era of assessment? Frontiers in
Psychology, 7. Doi: https://doi.org/10.3389/fpsyg.2016. 01097
Temane, Q.M. & Wissing, M.P. (2006). The role of subjective
perception of health in the dynamics of context and psychological
well-being. SA Journal of Psychology, 36(3), pp. 564–581.
Terre’Blanche, M. & Durrheim, K. (1999). Research in practice. Cape
Town: UCT Press.
Theron, C.C. (2007). Confessions, scapegoats and flying pigs:
Psychometric testing and the law. SA Journal of Industrial
Psychology, 33(1), pp. 102–117.
Theron, C. (2009). The diversity-validity dilemma: In search of
minimum adverse impact and maximum utility. SA Journal of
Industrial Psychology, 35(1), pp. 183–195, Art. #765, 13 pages.
doi:10.4102/sajip.v35i1.765. Retrieved on 28 November 2012 from
http://www.sajip.co.za
Theron, D. & Roodt, G. (1999). Variability in multi-rater competency
assessment. Journal of Industrial Psychology, 25(2), pp. 21–27.
Theron, D. & Roodt, G. (2000). Mental models as moderating variable
in competency assessments. Journal of Industrial Psychology,
26(2), pp. 14–19.
Theron, L.C. (2013). Assessing school readiness using the Junior
South African Individual Scales: A pathway to resilience. In S.
Laher & K. Cockcroft (Eds.), Psychological assessment in South
Africa: Research and applications (pp. 60−73). Johannesburg:
Wits University Press.
Thissen, D. & Steinberg, L. (1988). Data analysis using Item
Response Theory. Psychological Bulletin, 104, pp. 385–395
Thomson, L-A. (2007). The relationship between personality traits and
life balance: A quantitative study in the South African corporate
sector. Unpublished master’s dissertation. Johannesburg, South
Africa: University of Johannesburg.
Thorndike.E.L. (1918). The nature, purposes, and general methods of
measurements of educational products. Chapter II in G.M. Whipple
(Ed.), The seventeenth yearbook of the National Society for Study
of Education. Part II. The Measurement of Educational Products.
Bloomington, IL: Public School Publishing Co.
Thornton, G. C., & Gibbons, A.M. (2009). Validity of assessment
centers for personnel selection. Human Resource Management
Review, 19(3), pp. 169−187. Doi:
https://doi.org/10.1016/j.hrmr.2009.02.002.
Thorndike, R.L. & Hagen, E.P. (1977). Measurement and evaluation in
psychology and education (4th ed.). New York: John Wiley &
Sons.
Thornton, G. III & Rupp, D.E. (2006). Assessment Centers in Human
Resource management: Strategies for prediction, diagnosis, and
development. Mahwah, New Jersey: Lawrence Erlbaum.
Thuynsma, C. & de Beer, L.T. (2017). Burnout, depressive symptoms,
job demands and satisfaction with life: Discriminant validity and
explained variance. South African Journal of Psychology, 47(1),
pp. 46−59. doi: https://doi.org/10.1177%2F0081246316638564
Tomita, A., Labys, C. A., & Burns, J.K. (2014). The relationship
between immigration and depression in South Africa: Evidence
from the first South African National Income Dynamics Study.
Journal of Immigrant and Minority Health, 16(6), pp. 1062−1068.
Torgerson, W.S. (1958). Theory and methods of scaling. New York:
John Wiley.
Tracey, T.J.G. (2007). Moderators of the interest congruence-
occupational outcome relation. International Journal of Educational
and Vocational Guidance, 7, pp. 37–45.
Triadó, C., Villar, F., Solé, C. & Celdrán, M. (2007). Construct validity
of Ryff’s scale of psychological well-being in Spanish older adults.
Psychological Reports, 100, pp. 1151−1164.
doi:10.2466/PR0.100.4.1151-1164
Tuerlinckx, F., De Boeck, P., & Lens, W. (2002). Measuring needs
with the Thematic Apperception Test: A psychometric study.
Journal of Personality and Social Psychology, 82(3), pp. 448–461.
doi:10.1037/0022-3514.82.3.448
Ungerer, L.M. & Strasheim, A. (2011). The moderating effect of living
standards on the relationship between individual-level culture and
life satisfaction: A value segmentation perspective. Management
Dynamics, 20(3), pp. 25–50.
Unity technologies (2017). Home page: Imagine, build and succeed
with Unity. Retrieved on 25 September 2018 from Unity:
https://unity3d.com
U.S. Department of Labor (2017, November 2017). O*Net resource
Center. Retrieved on 13 September 2018 from:
http://www.onetcenter.org
Valchev, V.H., Nel, J.A., van de Vijver, F.J.R., Meiring, D., De Bruin,
G.P., & Rothman, S. (2013). Similarities and differences in implicit
personality concepts across enthno-cultural groups in South Africa.
Journal of Cross-Cultural Psychology, 44, pp. 365−388. doi:
10.1177/0022022112443856.
Valchev, V. H., Van de Vijver, F.J.R., Nel, J. A., Rothmann, S., &
Meiring, D. (2013). The use of traits and contextual information in
free personality descriptions across ethnocultural groups in South
Africa. Journal of Personality and Social Psychology, 104, pp.
1077−1091.
Van Dam, N.T. & Earlywine, M. (2011). Validation of the Center for
Epidemiologic Studies Depression Scale-Revised (CESD-R):
Pragmatic depression assessment in the general population.
Psychiatry Research, 186, pp. 128−132. doi: 0.1016/j.psychres.
2010.08.018
Van den Berg, A.R. (1996). Intelligence tests. In K. Owen & J.J.
Taljaard (Eds). Handbook for the use of psychological and
scholastic tests of the HSRC, pp. 157–190. Pretoria: HSRC.
Vandenberg, R.J. (2002). Toward a further understanding of and
improvement in measurement invariance methods and
procedures. Organizational Research Methods, 5(2), pp. 139–158.
doi: 10.1177/1094428102005002001
Vandenberg, R. J., & Lance, C.E. (2000). A review and synthesis of
the measurement invariance literature: Suggestions, practices, and
recommendations for organizational research. Organizational
Research Methods, 3(1), pp. 4–70. doi:
10.1177/109442810031002
Van der Merwe, R.P. (2002). Psychometric testing and human
resource management. SA Journal of Industrial Psychology, 28(2),
pp. 77–86.
Van der Merwe, K., Ntinda, K & Mpofu, E. (2016). African
perspectives of personality psychology. In L.J. Nicholas (Ed.),
Personality psychology (pp. 32−50). Cape Town, South Africa:
Oxford University Press Southern Africa (Pty) Ltd.
Van der Post, W.Z., De Coning, T.J. & v.d.M. Smit, E. (1997). An
instrument to measure organisational culture. South African
Journal for Business Management, 4, pp. 147–165.
Van de Vijver, F.J.R. (2002). Cross-cultural assessment: Value for
money? Applied Psychology: An International Review, 51(4), pp.
545–566.
Van de Vijver, F.J.R. (2016). Test adaptations. In F.T. Leong, D.
Bartram, F.M. Cheung, K.F. Geisinger, & D Iliescu (Eds.)., The ITC
international handbook of testing and assessment (pp. 364−376).
New York, NY: Oxford University Press.
Van de Vijver, F.J.R. & Leung, K. (1997) Methods and data analysis
for cross-cultural research. Sage: New York
Van de Vijver, F.J.R., & Leung, K. (2011). Equivalence and bias: A
review of concepts, models, and data-analytic procedures. In D.
Matsumoto & F.J.R. van de Vijver, Cross-cultural Research
Methods in Psychology (pp. 17–45). Cambridge, NY: Cambridge
University Press.
Van de Vijver, F.J. R., & Matsumoto, D. (2011). Introduction to the
methodological issues associated with cross-cultural research. In
D. Matsumoto & F.J.R. van de Vijver, Cross-cultural Research
Methods in Psychology (pp. 1–16). Cambridge, NY: Cambridge
University Press.
Van de Vijver, F.J.R., & Tanzer, N.K. (2004). Bias and equivalence in
cross-cultural assessment: an overview. European Review of
Applied Psychology, 54(2), pp. 119-135.
https://doi.org/10.1016/j.erap.2003.12.004.
Van Dierendonck, D. (2004). The construct validity of Ryff’s scales of
psychological well-being and its extension with spiritual well-being.
Personality and Individual Differences, 36, pp. 629−643.
doi:10.1016/S0191-8869(03)00122-3
Van Eeden, R. (1991). Manual for the Senior South African Individual
Scale – Revised (SSAIS–R). Part I: Background and
standardization. Pretoria: HSRC.
Van Eeden, R. & Prinsloo, C.H. (1997). Using the South African
version of the 16PF in a multicultural context. SA Journal of
Psychology, 27, pp. 151–159.
Van Eeden, R. & Mantsha, T.R. (2007). Theoretical and
methodological considerations in the translation of the 16PF5 into
an African language. SA Journal of Psychology, 37(1), pp. 62–81.
Van Eeden, R. & Van Vuuren, J. (2017). The cognitive processing
potential of infants: Exploring the impact of an early childhood
development programme. The South African Journal of Childhood
Education 7(1), a499. doi: https://doi.org/10.4102/sajce.v7i1.499.
Van Eeden, R. & Visser, D. (1992). The validity of the Senior South
African Individual Scale – Revised (SSAIS–R) for different
population groups. SA Journal of Psychology, 22, pp. 163–171.
Van Heerden, R. (2007). Exploring Normal South African and British
Children: A Comparative Study utilizing the Griffiths Mental
Development Scales – Extended Revised. Unpublished master’s
treatise. Nelson Mandela Metropolitan University.
Van Lingen, J.M., Douman, D.L., & Wannenburg, J.M. (2011). A
cross-sectional exploration of undergraduate nursing student
wellness and academic outcomes at a South African higher
education institution. SA Journal of Psychology, 41(3), pp. 396–
408.
Van Ommen, C. (2005). Putting the PC in IQ: Images in the Wechsler-
Adult Intelligence Scale – third edition (WAIS III). SA Journal of
Psychology, 35(3), pp. 532–554.
Van Rooyen, K. (2005). The Performance of South African and British
Children on the Griffiths Mental Development Scales – Extended
Revised: A Comparative Study. Unpublished master’s treatise.
Nelson Mandela Metropolitan University.
Van Rooyen, K. & Nqweni, Z.C. (2012). Culture and Posttraumatic
Stress Disorder (PTSD): a proposed conceptual framework. SA
Journal of Psychology, 42(1), pp. 51–60.
Van Rooyen, S.W. (2000). The psychosocial adjustment and stress
levels of shift workers in the South African Police Service, Flying
Squad. Unpublished master’s treatise, University of Port Elizabeth.
Van Tonder, C. & Roodt, G. (2008). Organisation development:
Theory and practice. Pretoria: Van Schaik.
Van Wijk, C.H. (2012). Self-reported generalised anxiety and
psychomotor test performance In healthy South Africans. SA
Journal of Psychology, 42(1), pp. 7–14.
Van Wijk, C.H. & Meintjies, W.A.J. (2015a). Grooved pegboard for
adult employed South Africans: Normative data and Human
Immunodeficiency Virus associations. South African Journal of
Psychology, 45(4), pp. 521–535. doi:
https://doi.org/10.1177/0081246315587692
Van Wijk, C.H. & Meintjies, W.A.J. (2015b). Grooved pegboard for
adult employed South Africans: Socio-demographic influences.
South African Journal of Psychology, 45(4), pp. 536–550. doi:
https://doi.org/10.1177/0081246315587693
Van Zyl, C.J.J., & Becker, J. (2012). Investigating the ethnic
equivalence of the 16PF5-SA. Paper presented at the XXX
International Congress of Psychology, CTICC, Cape Town, South
Africa.
Van Zyl, C.J.J., & De Bruin, G.P. (2016). Work-related Risk and
Integrity Scale technical manual (2016 edition). Johannesburg,
South Africa: JvR Psychometrics.
Van Zyl, C.J.J., & Taylor, N. (2012). Evaluating the MBTI® Form M in
a South African context. SA Journal of Industrial Psychology/SA
Tydskrif vir Bedryfsielkunde, 38(1), Art. #977, 15 pages.
http://dx.doi.org/10.4102/sajip.v38i1.977 Access at
http://www.sajip.co.za/index.php/sajip/article/viewFile/977/1249
Van Zyl, E.S. (2004). Die effek can biografiese veranderlikes op die
vlakke en oorsake van stress by ‘n groep finalejaar-mediese
student. South African Family Practice, 46(4), pp. 26–29.
Van Zyl, E. & Visser, D. (1988). Differential item functioning in the
Figure Classification test. SA Journal of Industrial Psychology,
24(2), pp. 25–33.
Van Zyl, E.S. & Van der Walt, H.S. (1991). Manual for the Experience
of Work and Life Circumstances Questionaire. Pretoria: HSRC.
Van Zyl, L. E., & Rothmann, S. (2012). Flourishing of students in a
tertiary education institution in South Africa. Journal of Psychology
in Africa, 22(4), pp. 593−599.
Van Zyl, P.M., Joubert, G., Bowen, E., Du Plooy, F., Francis, C.,
Jadhunancan, S., Fredericks, F. & Metz, L. (2017). Depression,
anxiety, stress and substance use in medical students in a five-
year curriculum. South African Journal of Health Professions
Education, 9(2), pp. 67−72. doi:10.7196/AJHPE.2017.v9i2.705
Veenhoven, R. (1991). Is happiness relative? Social Indicators
Research, 24(1), pp. 1–34.
Venter, A. (2006). The relationship between the big five personality traits
and emotional intelligence competencies among employees in a
Durban based medical aid administrator. Unpublished master’s
thesis. Durban, South Africa: University of KwaZulu-Natal.
Vernon, P.A. (1990). An overview of chronometric measures of
intelligence. School Psychology Review, 19(4), pp. 399–410.
Vincent, K.R. (1991). Black/White IQ differences: Does age make the
difference? Journal of Clinical Psychology, 47(2), pp. 266–270.
Visser, D., & Du Toit, J.M. (2004). Using the Occupational Personality
Questionnaire (OPQ) for measuring broad traits. SA Journal of
Industrial Psychology, 30(4), pp. 65–77.
Visser, D., & Viviers, R. (2010). Construct equivalence of the OPQ32n
for Black and White people in South Africa. SA Journal of Industrial
Psychology/SA Tydskrif vir Bedryfsielkunde, 36(1), pp. 1–11. Art.
#748, 11 pages. DOI: 10.4102/sajip.v36i1.748
Vogt, L. & Laher, S. (2008). The five-factor model of personality and
individualism/collectivism in South Africa: An exploratory study.
Psychology in Society, 37, pp: 39−54.
Vogt, L. & Laher, S. (2009). The relationship between
individualism/collectivism and the Five Factor Model of personality:
An exploratory study. Psychology in Society, 37, pp. 39–54.
Von Stumm, S. (2012). You are what you eat? Meal type, socio-
economic status and cognitive ability in childhood. Intelligence, 40,
pp. 576–583.
Vrana, S. R., & Vrana, D.T. (2017). Can a computer administer a
Wechsler Intelligence Test? Professional Psychology: Research
and Practice, 48(3), pp. 181−198.
Vygotsky, L.S. (1978). Mind in society: The development of higher-
order psychological processes. Cambridge, MA: Harvard
University Press.
Wang, P.S., Beck, A.L., Berglund, P., McKenas, D.K., Pronk, P.N.,
Simon, G.E. & Kessler, R.S. (2004). Effects of major depression on
moment-in-time work performance. The American Journal of
Psychiatry, 161(10), pp. 1885–1892.
Ward, C.L., Lombard, C.J., & Gwebushe, N. (2006). Critical incident
exposure in South African emergency services personnel:
Prevalence and associated mental health issues. Emergency
Medical Journal, 23, pp. 226–231. doi: 10.1136/emj.2005.025908
Watkins, C.E., Jr., Campbell, V.L., Nieberding, R., & Hallmark, R.
(1995). Contemporary practice of psychological assessment by
clinical psychologists. Professional Psychology: Research and
Practice, 26, pp. 54–60.
Watson, D., Clark, L.A., & Tellegen, A. (1988). Development and
validation of brief measures of positive and negative affect: The
PANAS scales. Journal of Personality and Social Psychology,
54(6), pp. 1063–1070.
Watson, M., & Kuit, W. (2007). Postmodern career counselling and
beyond. In K. Maree (ed.), Shaping the story: A guide to facilitating
narrative counselling (pp. 73–86). Pretoria, South Africa: Van
Schaik,
Wechsler, D. (1944). The measurement of adult intelligence (3rd ed.).
Baltimore: Williams & Wilkins.
Wechsler, D. (1958). The measurement and appraisal of adult
intelligence (4th edition). Baltimore: Williams and Wilkins.
Wechsler, D. (2014). Wechsler Adult Intelligence Scale-Fourth Edition.
Bloomington, MN: NSC Pearson.
Weiss, L. (2003). WISC-IV: Clinical reality and the future of the four-
factor structure. Paper presented at the 8th European Congress of
Psychology, Vienna, Austria, 6–11 July 2003.
Weiss, D.J., Dawis, R.V., Engand, G.W. & Lofquist, L.H. (1967).
Manual for the Minnesota Satisfaction Questionnaire. Minneapolis:
University of Minnesota.
Wessels, S.M. (2011). The Children’s Auditory Performance Scale
(CHAPS): An Exploration of its Usefulness in Identifying Central
Auditory Processing Disorder in a South African Clinical Setting.
Unpublished Doctoral thesis, Nelson Mandela Metropolitan
University, Port Elizabeth.
Westaway, M. S., & Maluka, C.S. (2005). Are life satisfaction and self-
esteem distinct constructs? Black South African perspectives.
Psychological Reports, 97, pp. 567–575.
Whetzel, D. L., & McDaniel, M.A. (2009). Situational judgment tests:
An overview of current research. Human Resource Management
Review, 19, pp. 188–202. doi:10.1016/j.hrmr.2009.03.007
Whipple, G.M. Whipple (Ed.), The seventeenth yearbook of the
National Society for Study of Education. Part II: The Measurement
of Educational Products. Bloomington, IL: Public School Publishing
Co.
WHO Global
Status
Report
on
Alcohol
(2004).
Retriev
ed 27
September. 2008, from
http://www.who.int/substance_abuse/publications/en/south_africa.p
WHO Online Question and Answer: What is Mental Health? (2008).
Retrieved 27 September, 2008, from
http://www.who.int/features/qa/62/en/index.html
Wicken, A.W. (2000). Wellness in industry. Work, 15, pp. 95–99.
Wiggett, C. (2006). Guide dog ownership and psychological well-
being. Unpublished doctoral dissertation, University of
Stellenbosch, South Africa.
Wilcocks, R.W. (1931). The South African Group Test of Intelligence.
Stellenbosch: University of Stellenbosch.
Williamson, D.M., Bauer, M., Steinberg, L.S., Mislevy, R.J., Behrens,
J.T. and DeMark, S.F. (2004). Design rationale for complex
performance assessment. International Journal of Testing, 4(4),
pp. 303–332.
Willig, A.C. (1988). A case of blaming the victim: The Dunn
monograph on bilingual Hispanic children on the U.S. mainland.
Hispanic Journal of Behavioral Sciences, 10, pp. 219–236.
Wilson, M (2004). Developing measures: An item response modeling
approach. Mahwah, NJ: Lawrence Erlbaum Associates.
Winter, D.G., & Stewart, A.J. (1977). Power motive reliability as a
function of retest instructions. Journal of Consulting and Clinical
Psychology, 45, pp. 436–440.
Winter, D.G., Stewart, A.J., John, O.P., Klohnen, E.C., & Duncan, L.E.
(1998). Traits and motives: Toward an integration of two traditions
in personality research. Psychological Review, 105, pp. 230–250.
Wise, S.L., Barnes, L.B., Harvey, A.L., & Plake, B.S. (1989). Effects of
computer anxiety and computer experience on computer-based
achievement test performance of college students. Applied
Measurement in Education, 2, pp. 235–242.
Wissing, J.A.B., Du Toit, M.M., Wissing, M.P., & Temane, Q.M.
(2008). Psychometric properties of various scales measuring
psychological well-being in a South African context: The FORT 1
project. Journal of Psychology in Africa, 18(4), pp. 511–520.
Wissing et al. (2010). Validation of three Setswana measures for
psychological wellbeing. SA Journal of Industrial Psychology,
36(2), pp. 1–8.
Worley, J. A., Vassar, M. Wheeler, D.L. & Barnes, L.B. (2008). Factor
structure of scores from the Maslach Burnout Inventory: A review
and meta-analysis of 45 exploratory and confirmatory factor-
analytic studies. Educational & Psychological Measurement, 68,
pp. 797–823.
Wyse, A.E. & Mapuranga, R. (2009). Differential Item Functioning
analysis using Rasch item information functions. International
Journal of Testing, 9, pp. 333–357.
Xu, G., Wang, C., & Shang, Z. (2016). On initial item selection in
cognitive diagnostic computerized adaptive testing. British Journal
of Mathematical and Statistical Psychology, 69, pp. 291−315.
Yang, W., Lance, C.E. & Hui, H.C. (2006). Psychometric properties of
the Chinese Self-Directed Search (1994 edition). Journal of
Vocational Behavior, 68(3), pp. 560–576.
Yeo, R.A., Ryman, S.G., Pommy, J., Thoma, R.J., & Jung, R.E.
(2016). General cognitive ability and fluctuating asymmetry of brain
surface area. Intelligence, 56, pp. 93−98.
Zenisky, A.L. & Sireci, S.G. (2002). Technological innovations in
large-scale assessment. Applied Measurement in Education, 15,
pp. 337–362.
Zhou, J. (2009). A review of assessment engineering principles with
select applications to the Certified Public Accountant Examination.
Technical report prepared for The American Institute of Certified
Public Accountants. Retrieved 30 November 2012 from
http://www.aicpa.org/becomeacpa/cpaexam/psychometricsandscori
a_review_of_assessment_engineering.pdf
Zhu, J. & Tulsky, D.S. (2000). Co-norming the WAIS-III and WMS-III:
Is there a test-order effect on IQ and memory scores? Clinical
Neuropsychology, 14(4), pp. 461–467.
Zickar, M.J., Cortina, J., & Carter, N.T. (2010). Evaluation of
measures: Sources of sufficiency, error, and contamination. In J.L.
Farr & N.T. Tippins, Handbook of Employee Selection (pp. 399–
416). New York: Psychology Press (Taylor & Francis Group).
Zikmund, W.G., Babin, B.J., Carr, J.C., & Griffin, M. (2010). Business
research methods (8th edition). Mason, OH: South-Western
Publishing.
Zinn, S. & Langdown, N. (2011). E-book usage amongst academic
librarians in South Africa. South African Journal of Libraries and
Information Science, 77(1), pp. 104−115.
Zuilkowski, S.S., McCoy, D.C., Serpell, R., Matafwali, B. & Fink, G.
(2016). Dimensionality and the development of cognitive
assessments for children in sub-Saharan Africa. Journal of Cross-
Cultural Psychology, 47(3), pp. 341–354. doi
http://dx.doi.org/10.1177/0022022115624155).
Zunker, V.G. (2016). Career counselling: A holistic approach. Boston,
MA: Cengage Learning.
Index
Page numbers in italics indicate information contained in
illustrations or tables.

A
ability 183–184
cognitive 292
groups 189–190
ability level and reliability 65–66
Ability, Processing of Information, and Learning Battery 27,
273 absenteeism 206
and HIV/Aids 207
and ill health 206
and mental illness 208
academic achievement 75–76, 290–
291 academic counselling 239
accountability 308
educational 290
acculturation 322
acquiescence bias 65
administering measures 144–145, 145, 328
administering measures, practitioner’s duties 161
assessment anxiety 158–159
control over groups 156
establishing rapport 156–158
managing irregularities 160–161
mental disabilities 163–164
motivating test-takers 156
physical disabilities 163
providing instructions 159–160
recording behaviour 161
relationship between practitioner, test-taker 155–158
scientific attitude 155–156
time limits 160
young children 162–163
administering measures, practitioner’s duties after assessment
166 collecting, securing materials 164–165
recording, scoring, interpreting 165
administering measures, preparation 146,
154
checking conditions 148–150
checking materials, equipment 147–
148 ethical issues 147
familiarity with measures, instructions
148 informed consent 153–154 linguistic
factors 152–153
personal circumstances, timing 150–
151 planning sequence, length 151–152
and test sophistication 153
administrative error and reliability 66
Adolescent Personality Questionnaire
227 adult assessment 158
Adult Neuropsychological Questionnaire 147
adverse impact 273
Affectometer-2 Scale 214
affect, positive and negative 203
African-centred assessment 8–9
African-centred item development 342–343
African-centred measures 337, 337–338, 339
African-centred non-verbal cognitive reasoning stimuli 91
African-centred, Western-oriented approaches combined 335, 336,
336–337, 341–342
age-related changes 315–316
alcohol abuse 208
alternate-form reliability 62
Annual National Assessments 290
answer mechanisms, protocols 92–93
anti-test lobby 34, 35
anxiety 205
assessment 158–159
and test-taking 158–159, 330
apartheid 25–28
apperception 236–237
aptitude
and career counselling 244
school-age learners 291
see also intelligence measures, aptitude
Aptitude Test for School Beginners 193, 194,
284 artificial intelligence 257, 262–263
assessment 3–4
African-centred 8–9
definition 4
shift in role of professionals 4–5 and
technological developments 4–5
and tests, testing 4–5 tools 4

use in modern society 33–35


assessment centres 276–277
assessment measures and tests 5
approximate nature of 6–7
main characteristics of 5–6
as only one source of information 6
terms of variation 6
assessment, international
perspective early developments
15–16
early twentieth century 16–17 influence of
multiculturalism 21–22 influence of
technology 19, 20, 20–21 measurement
challenges 17–19 standards, training, test
users’ roles 22–25 and World Wars 17, 18

assessment process 7–8


multidimensional information gathering 7, 7
and opinion 8
synthesis, integration of information 8
assessment, retrospective overview 12–
13 early origins 13–15
modern international perspective 15–
25 assessment rooms 148–149
association, measures of 50, 50, 51, 51
astrology 13
Automated Item Generation 198
autonomy 203

B
Basic Traits Inventory 227–228, 228,
337 and careers 246–247
Bayley Scales of Infant Development 185,
285 Beck Depression Inventory 215
Beck Depression Scale 146
behavioural cluster analysis 305–306
Bender-Gestalt Test 195 bias

construct 118
and construct validity 332–333
definition 108
extremity 65
prediction 116–118
see also equivalence, statistical approaches
to Big Five model of personality traits 227
bilingual assessment 104 bilingual test-taker
approach 106
Bill of Rights, test-taker’s 309
Binet-Simon Scale 16–17, 170
biological context 315–317
biological intelligence 172
biological measures of intelligence 173–
174 bipolar disorder 205
Broad Band Competency Assessment Battery
274 burnout 209, 210, 218–219
C
Campbell Interest and Skills Survey 274
career, academic counselling 239
Career Construction Interview 252
career construction theory and life-design counselling 251–
253 career-counselling assessment 242–243
and aptitude 244
in a changing environment 253
developmental approach 248–249
and intelligence 244
and personality 246–247
person-environment fit approach 243–248
systems approach 249–251
and values 247
and vocational interests 245–246
Career Interest Profile 252, 274
career maturity 248
Career Path Finder 274
category scales 42, 42
Cattell-Horn-Carroll model 174
causal interpretation 303
Caveon test security 331
central tendency, measures of 47–48, 48–49
Centre for Epidemiologic Studies Depression Scale 215–216,
340 cheating 331–332
minimising 149–150
children
administering assessments 162–163
conveying test results 309–310
and rapport 157
research 282–283
Children’s Personality Questionnaire 226–
227 Chinese Personality Inventory 22
chirology 14
classification of psychological measures 132–134, 135, 135
Clerical Test Battery 273
clinical interpretation 306–307
clinical practice 34–35
closed questions 296
coaching 139
coefficient
of determination 78
of equivalence 62
of internal consistency 62
of stability 62
Coefficient Alpha 63, 63
cognitive ability vs cognitive functions
292 see also under intelligence
cognitive diagnostic assessment 171
cognitive development 174
cognitive functioning 169–170, 292
Cognitive Process Profile 273
comparative rating formats 46
competency assessment approach 272–273
Competency Assessment Series 274
competency-based assessment 23–24
competing values 140, 140–141, 141–142, 296–297
composite scale 46
computer-based and Internet-delivered assessment 255–256,
263 artificial intelligence 262–263
and cloud storage 259
ethical and legal use 264–267
gamified assessments 260–261
important terms 256–257
and simulations 257, 260–262
in South Africa 258
in the twenty-first century 258–263
virtual content 261
computer-based tests 19–21, 29, 32,
87 administration 256
interpretation 257
scoring 256
computerised adaptive testing 170, 256
computerised simulation assessments
257 concept-driven adaptations 102
conceptual intelligence 174 concurrent
validity 73–74
confidentiality 307–308
configural invariance 114
confirmatory factor analysis 112–113
co-norming of measures 52
consistency of measurement 60–61
constant-sum scales 44, 44
construct bias 118
construct-identification procedures 72–73
constructional performance 196
constructivism 243, 249, 251
construct maps 86–87
content-description procedures 71–72
content validity 71–72
contextual intelligence 172, 174
continuous data 40
convergent validity 73
conveying test results 307
children’s 309–310
ethical considerations 307–308
methods of 308–309
corrected reliability coefficient 62, 62
counsellors, registered 129
counterproductive behaviour 238–239
criterion
contamination 76, 78
deficiency, relevance 76
keying 86, 231
measures 75–77
criterion-prediction
procedures 73–75
validity 73
criterion-referenced achievement tests 291
criterion-referenced assessment 67
criterion-referenced measures 85
interpretation 307
Critical Reasoning Test Battery 273 cross-
cultural test adaptation 21, 101–102
challenges in South Africa 103–104
equivalence of adapted versions of measures 104–
107 equivalence of measures 104–105
linguistic equivalence 105–107
multicultural test development 120–122
reasons for 101
statistical approaches to equivalence 107–
119 steps for maximising success 120 types
of 102, 102–103, 103
cross-cultural use of personality assessment measures 232–233
cross-validation 75
crystallised intelligence 173
culturally appropriate measures 341, 341
African-centred item development 342–343
capacity-building 345
delineating norm groups 343–344
ethical practices 345–347
qualitative measures 344
repositories of research studies 344–345
culturally sensitive Rorschach Comprehensive System 35
cultural meaningfulness 334–335
combining African-centred, Western-oriented approaches 335,
336, 336–337, 341–342
indigenous, African-centred measures 337, 337–338, 339
roadmap for next phase 341–347
Western-oriented measures for SA 339, 340, 340–
341 culture and results 321–323
culture-free, culture reduced, culture-common tests
21 Curriculum Assessment Policy Statements 291
Customer Service Aptitude Profile
273 cut-off scores 56–57, 302

D
data
continuous 40
displaying 47, 47
generating nominal 44, 44
generating ordinal 45, 45
Denver Developmental Screening Test 282
depersonalisation 218 depression 205

Depression, Anxiety and Stress Scale 215


descriptive interpretation 302–303
developmental approach, career-counselling 248–249
developmental scales 52–53
developmental screening 282–283,
285 deviation IQ scale 54, 55, 55
Diagnostic and Statistical Manual of Mental Disorders 5 (DSM-V)
205, 215
diagnostic measures application of
results 196–197 cognitive diagnostic
assessment 171 comprehensive
282
infants, preschool children 281–282, 284–
285 scholastic tests 197
see also psychodiagnostic
assessment Differential Aptitude Tests
192, 193, 194 differential impact of
subgroups 77–78 differential item
functioning 109, 110 differential
prediction 116 differential validity 74
diffuse brain injury 316–317
digital literacy 326–327
Digit-symbol Coding Subtest 146
discriminant validity 73
discrimination index, items 94, 94
distal/proximal factors 323
domains of functioning 5
Draw- A-Person test 238, 283
DSM-V see Diagnostic and Statistical Manual of Mental Disorders 5
dynamic assessment 170
intelligence 174–175
school-age learners 292–293

E
education 28–29
and aptitude 193–194
higher 34, 293–294
scholastic tests 197
educational achievement measures 52–53, 290–
291 educationally disadvantaged 157
educationally focused screening 283–
284 education, school-age learners 288
achievement measures 290–291
aptitude measures 291
cognitive measures 291–292
dynamic assessment 292–293
and educational accountability 290
personality-related measures 292
effort and test-taking 330
emic approach 233
emotional intelligence 86, 89, 92, 171, 176
emotionality 159
Employee Assistance Programme
211 Employment Equity Act
and assessment, modern: South Africa 30–
32 and computer-based tests 266 and
prediction bias 117–118
and reliability 61
environmental mastery 203
equivalence, statistical approaches to 107
of adapted versions of measures 104–
107 and bias 107–109
and bias at item level 109–112
and bias in factor structures between groups 112–
115 construct bias and construct equivalence 118
and invariance 108
and measurement bias 108
and measurement equivalence 108–109
method bias 119
nomological bias 119
and prediction bias 116–118
errors 41–42
error variance invariance 115
essay questions 291
ethical considerations
administering measures, preparation 147 computer-based
and Internet-delivered assessment 264–267 conveying test
results 307–308
see also fair and ethical practices
ethical dilemmas 139
ethical practices
culturally appropriate measures 345–347
statutory control of assessment measures, SA 136–
139 see also fair and ethical practices
etic approach 232–233
eudaimonic perspective 202, 203
evaluation of psychological measures 132–134, 135,
135 evaluative interpretation 303–304
even-numbered vs odd-numbered rating 46–47
evidence 5
expectancy
tables 56, 56
of tester 156
Experience of Work and Life Circumstances Questionnaire 216–
217 experimental psychologists 16
experimental version of measure 84, 84
expert witnesses 298–299
exploratory factor analysis 112
extraverts, introverts 230
extremity bias 65

F
face validity 71–72
factor analysis 72–73, 224
factor-analytic strategy 226
factorial validity 72–73
factor variance-covariance invariance
115 factual-driven adaptations 102
factual vs inferred information 296
fair and ethical practices 136
and assessment practitioners 136–137
competing values 140, 140–141, 141–
142 defining 136
ethical dilemmas 139
in industry 140, 140–141, 141–142
multiple constituents 139–140, 140–141, 141–142
preparing test-takers 138–139
professional practices 137–138
responsibilities of organisations 142–
143 rights, responsibilities of test-takers
138
faking bad/good 231, 330–331
familiarity/recognisability-driven adaptations
103 Family Environment Scale 301 feeling
individuals 230
15 Factor Questionnaire 227
15 Factor Questionnaire Plus 274
fluid intelligence 173
Folstein’s Mini Mental Status Examination 146
forced-choice items 88
forced-choice scales 44
generating nominal data 44, 44
generating ordinal data 45, 45
forensic psychologists and test use 131
format-driven adaptations 103 Full
Scale IQ 180
full-score equivalence 113–114
functional equivalence 113

G
gamified assessments 257, 260–261, 278
General Ability Measure for Adults 273
General Adaptability Battery 26
General Health Questionnaire 214–215
Generalised Anxiety Disorder scale 216
General Reasoning Test Battery 273
General Scholastic Aptitude Test 188, 189, 190
Global Intelligence Scale 180–181
Goodenough-Harris Drawing Test 283
grade equivalents 52–53
Graduate Talent Simulations 274,
275 graphic rating scales 44, 44
graphology 14–15
Griffiths III 181, 182–183, 185
Griffiths Mental Development Scales – Extended Revised 181
group characteristics 279
group processes 278
group structure 279
Grover-Counter Scale of Cognitive Development 181, 183, 185
Guttman scales 45, 45

H
halo effect 65
Hamilton Depression Scale 146
happiness 202
Health Professions Act 28, 32, 127–
128 classifying a measure 134–135
and linguistic factors 152–153
hedonistic perspective 202
higher education 34
well-being of students 211–213
High School Personality Questionnaire
227 HIV/Aids
test battery 146–147
in the workplace 206–207
Hogan assessment series 274
Hogan Development Survey 239
Hogan Motives, Values, and Preferences Inventory
274 Hogan Personality Inventory 229 home
environment 323–324
Home Screening Questionnaire
301 honesty contract 332
Human Sciences Research Council 31, 34
humorology 13–14

I
identity 222
in-basket test 278
incremental validity 74
Individual Scale for General Scholastic Aptitude
180 industrial psychologists and test use 130
industry 34, 271–272
and aptitude 194
assessment of individuals 272–278 assessment
of organisations 279–280 assessment of
workgroups, work teams 278–279
ethical assessment practices in 140, 140–141, 141–142
mental illness in 207–208
personnel selection 272–275
infant intelligence, predictability of 315
infant, preschool developmental assessment
280 developmental assessment, shortcomings
285 developmental screening 282–283
diagnostic measures 284–285 educationally
focused screening 283–284
functions assessed 281
info. from other observations 285–286
why we assess development 280–281
inferred information 296 influence of tester
330 information-processing and
intelligence 174 informed consent 153–154
informed opinions 300

input/output-based approaches 272–273, 275


integrated descriptive narrative 306
integrated identity 222
integrity tests 332
intelligence
biological 172
and career counselling 244
defining 171–173
deviation IQ scale 54, 55, 55
emotional 171, 172, 176
psychometric 172
social or contextual 172
intelligence measures, aptitude 191–192
application of results 193–194
description and aim 192, 192–193,
193 educational context 193–194
industrial context 194
intelligence measures, group 188
application of results 189–191
description and aim 188–189
general ability 189–190
research in SA context 190–191
intelligence measures, individual
180 application of results 183–187
description and aim 180–183
general ability 183–184
profile of abilities 184
research in the SA context 184, 185–186, 186–187
intelligence measures, scholastic tests 197
intelligence measures, specific cognitive functions 194–
195 application of results 196–197
constructional performance 196
description and aim 195–196
motor performance 196
perceptual measures 195–196
research in SA context 197 intelligence
measures, trends in 197–198 Intelligence
Quotient (IQ) 177, 291, 315, 316
deviation IQ 177
standardised scores 177
intelligence score and diversity
individual differences, cultural diversity 178–
179 intelligence score 176–177
intelligence testing
brief history, overview 170–171
construct, method and item equivalence 178–
179 misuse of 17
Wechsler Intelligence Scales 17–18,
21 intelligence, theories of 173
biological measures 173–174
Cattell-Horn-Carroll model 174
conceptual intelligence, systems approach
174 contextual intelligence 174
dynamic assessment 174–175
emotional intelligence 176
information-processing approach
174 learning potential 175 multiple
factors 173
multiple intelligences 174
single general factor 173
stages of cognitive development
174 intensity scales 43, 43–44
intercept bias 117
inter-item consistency 62–63, 63
International Test Commission 21, 120, 136, 138,
264 Internet-delivered testing 19–21, 256–257
interpretation 301–302
causal 303
clinical 306–307
combining mechanical, non-mechanical 305–306
criterion-referenced measures 307
descriptive 302–303
evaluative 303–304
mechanical 304
non-mechanical 305
norm-referenced tests 306–307
predictive 303
and validity 301–304
inter-scorer reliability 63, 63
inter-scorer reliability coefficient 63, 63
interval scale 40
interviews
psychodiagnostic assessment 295–297
structured, unstructured 276
intrapsychic context 317–318 intra-
scorer reliability 64, 64 intra-scorer
reliability coefficient 64, 64 introverts
230
intuitive individuals 230
invariance 108
ipsative scales (and scoring) 45, 45
ipsative vs normative scale options 47
ISO Standard for Assessment in Organisational Settings
23 item bias 95
item characteristic curve 120–121,
121 item response theory 120–121
and computerised adaptive testing 256
and detection of differential item functioning
121 items
determine difficulty 93, 93, 94
determine discriminating power 94, 94
development, creation, sourcing 90–
92 differential item functioning 95
discrimination index 94, 94 final pool
95
item-response theory 94–95
item-total correlation 94
reviewing, arranging 92
revising, selecting 96

J
Job Demands-Resources model 208–209,
209 job satisfaction 210
Job Specific Solutions 274
judging vs perceiving individuals 230
Jung Personality Questionnaire 230–
231
Junior South African Individual Scales 180–181, 183,
185 juvenile delinquents 157

K
K-ABC 185
Kant, Immanuel 38
known-groups validity 74

L
language
bias 89
and results 319–321
see also under linguistic
language and test adaptation 102, 103–104
back-translation 106
forward-translation 105–106
Learning Orientation Index 274
learning potential 319
and intelligence 175
Learning Potential Computerised Adaptive Test 89, 171, 188–189,
190, 273, 337
leniency bias 65
life-design counselling 251–253
Life Orientation Test –Revised 216
Life Role Inventory 248–249
life satisfaction 203
Likert-type scales 42, 43, 43
linearly transformed standard scores
53 linguistic equivalence 105–107
empirical investigations 106–107
judgemental approaches 105–106
linguistic factors 152–153
administering measures, preparation 152–
153 see also under language
linguistics-driven adaptations 102
Locus of Control Inventory 217

M
malingering 330–331
Mantel-Haenszel procedure 110
Maree Career Matrix 246
Maslach Burnout Inventory 218–219
mastery testing 67
maximal and typical performance
273 MB-10 245
McCall’s T-score 54, 54
McCarthy Scales of Mental Abilities 284–285
McMaster Health Index Questionnaire 217
mean 48, 48
measurement 38
absolute zero 39
categories of levels 39–41
definition 38
equal intervals 38–39, 39
errors 41–42
levels of 38–39
magnitude 38
properties of scales 38–39
psychology professionals who may use 128–
131 measurement and statistical concepts 37–38
definition of measurement 38
measurement invariance approaches 114–115
measurement scales 42
deciding on format 46–47
different scaling options 42–45
measures, developing 82
administration 96
assemble, pre-test experimental version 92–93
classification 97
item-analysis phase 93–95
item development, creation, sourcing 90–
92 planning phase 83–90
publish, refine continuously 96–98
revise, standardise final version 95–96
steps in 82–83, 84, 84–98
technical evaluation, establishing norms
96 measures, evaluating 98–99 Mechanical
Insight test 192
mechanical, non-mechanical interpretation 304–306
median 48, 48
mental age 315–316
mental-age scales 52
mental disabilities 163–164
practitioner’s duties 163–164
mental disorders, deficiencies 16
mental illness 204–205
in the workplace 207–208
mental status examination 296
meta-analysis 75
method bias 89
methodological considerations 328
influence of tester 330
patterns in scores 328–330
standardised procedures 328
status of test-taker 330–332
metric equivalence 113
metric invariance 114
Minnesota Multiphasic Personality Inventory 18, 231–
232 Minnesota Satisfaction Questionnaire 219 mode 48

moderated multiple-regression analysis


116 moderator variables 78 Motivational
Profile 274
motivation and test-taking 156, 330
motives 235–237, 236–238
motor performance 196
multiculturalism 21–22
see also cross-cultural test adaptation
multicultural test development 120–122
multicultural test development, simultaneous 121
multidimensional computerised adaptive testing
171 multidimensional information gathering 7, 7
multimedia items 88
multiple-choice vs essay questions
291 multiple intelligences 174
multiple regression equation 51, 51
multiple sources of information 7, 7
Murphy-Meisgeier Type Indicator 230
Myers-Briggs Type Indicator 229–231
and careers 246
and personnel selection 274
My System of Career Influences 250, 338

N
negative past experiences 157
negative self-orientated thoughts 159
neuropsychologists and test use 131
nominal measurement levels 40
non-mechanical interpretation 304–
306 non-response errors 64–65
normalised standard score 54 normally
distributed test scores 47, 47 norm-
driven adaptations 102
norm groups, establishing 51–52 norm-
referenced tests, interpretation 306–307
norms 51–57
definition 51
norm scores, scales 52–55
interrelationships of 55
issues with use of 57
setting standards, cut-off scores 55–57

O
objective, subjective formats 88 Occupational
Interest Profile 274 Occupational Personality
Profile 274 Occupational Personality
Questionnaire 229, 274 open and closed
questions 296 open-ended items 88

opinions 8, 300
ordinal measurement levels 40
organisational characteristics 279
organisational processes 279
organisational research 280
organisational structures 279
organisations
assessment of 279–280
ethical responsibilities 142–143
output-based approaches 272–273,
275 overcrowding 323–324

P
paired-comparison scales 44, 44
palmistry 14
parameter estimates 113
patterns in scores 328–330
Pearson product-moment correlation 50, 50
Pearson product-moment correlation coefficient 50, 50
perceiving individuals 230
percentages and percentiles 53
percentile rank score 53
perceptual measures 195–196
Perceptual Reasoning Index 180, 181–182
performance
constructional 196
job 76
maximal and typical 273
motor 196
ratings 275–276
in specialised training 76
performance-based items 88
personal growth 204
personality
and career-counselling 246–247
and counterproductive behaviour 238–
239 early views on 13–14
personality assessment 221 conceptual
scheme 222–223 cross-cultural use of
measures 232–233 relatively stable
personality traits 223–235 structured
measures 18
uses and applications 239–240
personality assessment, motives and concerns 235–
236 explicit motives 236
implicit motives 236–237
projective methods 236–238
Personality Inventory for Children 232
person-environment fit 243–248
personnel selection 239, 272–275
phrenology 14
physical disabilities 163
physical impairments 316–317
physical well-being 150
physiognomy 13
portfolio assessment 291
Positive Affect Negative Affect Scales
214 poverty 324
power measures 6 practice effects
332 practice examples, tests 139
pragmatics-driven adaptations 103
prediction bias and equivalence
116
and Employment Equity Act 117–118
procedures to establish 116, 117, 117
predictive validity 74
preparation prior to session 146
selecting measures for test battery 146–147
preschoolers 157, 162–163 prisoners,
juvenile delinquents 157
Processing Speed Index 180, 182
Professional Board for Psychology 32, 36, 132–136
proficiency tests 197
profile analysis 304
Progressive International Reading Literacy Study
290 projective assessment techniques 222 projective
tests 88
proximal factors 323
psychodiagnostic assessment 294–295
interviews 295–297
limitations of 299–300
psychological knowledge, expertise 298–
299 psycholegal assessment 298–299
psycholegal contexts 295 psychological
assessment 24
psychological measurement see
measurement psychologists and test use 130
psychometric intelligence 172
psychometrics 4, 15–16
Psychometrics Committee 32, 132–135
psychometrists and test use 129
psychopathology 318
psychotechnicians and test use 128
psychotherapists and test use 129
psychotherapy 239
public protection 135–136
purpose in life 204
purposive falsification 65

Q
quality of life 202
Quanti-Skill Accounting Assessment 273
question- vs statement-item formats 46

R
random errors 41
range 49
range restriction and reliability 65,
66 rapport 156–158, 330 Rasch
model 110
ratings 76
ratio 40
rational method of measure 86
Raven Progressive Matrices 21, 191
Raven’s Advanced Progressive Matrices 178, 339
recognisability-driven adaptations 103 regression
analysis 79, 79
Regulations Defining the Scope of the Profession of
Psychology 127–128, 128–131, 131
reliability coefficient 61, 62–66
inter-scorer 63
intra-scorer 64
magnitude of 67
reliability of a measure 59–60
approaches to estimating 64
defining 60–61, 61
factors affecting 64–66
interpretation of 66–67
types of 61–64
reporting results, written form 310–311
research
cognitive functioning 170–171, 184–187, 190–191, 194–
197 computer-based assessment 260–262
cross-cultural measures 339–341, 344–
345 organisational 280
personality assessment 227, 232, 233,
239 well-being 203
research psychologists and test use
130 respondent error 64–66, 66
response bias 64–65, 89 response set
222
results, factors affecting 313–314
results in context 314
age-related changes 315–316
biological context 315–317
culture 321–323
environmental factors 323–325
home environment 323–324
intrapsychic context 317–318
language 319–321
physical impairments 316–317
psychopathology 318
schooling 318–319
socio-economic status 324–325
test-wiseness 325–327
transient conditions 318
urbanisation 325
Rey-Osterrieth Complex Figure Test 146
rights, responsibilities of test-takers 138
role saliency 248
Rorschach Comprehensive System 35
Rorschach Inkblot Test 88, 160, 237–238,
340 Rosenthal effect 156
Ryff’s Psychological Well-being Scales 217–218

S
sample homogeneity 78
San people 27
Satisfaction with Life Scale 216
scalar invariance 114–115
scatter 329
scholastic tests 197
school-age children 157, 291–292
School-entry Group Screening Measure 288–289
School-readiness Evaluation by Trained Testers 284
scientific attitude 155–156
self-acceptance 203
Self-directed Search 245
self-selection bias 64–65
semantic differential scales 43, 43
Senior South African Individual Scale 181, 183,
186 Sense of Coherence 216
sensing vs intuitive individuals 230
sentence-completion tests 238
serious games 260–261, 278
SHL Verify ability tests 192, 192, 194
simple regression equation 50, 50
simulations 257, 260–262, 277
simultaneous multicultural test development 121
simultaneous multilingual test development 22
single attribute vs comparative rating formats 46
single dimensional vs composite scale 46
single general factor 173
Situational Judgment Tests 277
situational tests 276–278
16 Personality Factor
Questionnaire and careers 246
and cultural groups 340
personality assessment 224–225, 225, 225–
227 and personnel selection 274
skewed test scores 47, 47
social-desirability bias 65
social intelligence 172
socially desirable responses 222–223
socio-economic status 324–325
Sources of Work Stress Inventory 218
South African Burnout Scale 219, 338
South African Career Interest Inventory 245, 338
South African Personality Inventory 121, 227, 233, 234–235, 337
speech impairments 317
speeded measure, reliability of 65
speed measures 6
speed tests 88–89
split-half reliability 62, 62
standard deviation 49–50, 50
standard error
of estimation 78, 78–79, 79
of measurement 329
standardised conditions 16
standardised procedures 328
standardised tests 18
Standard Measurement Error 67,
67 standard normal distribution 51
standard scores 53–54 stanine
scale 54, 54
State-Trait Anxiety Inventory 214
statistical concepts 47–51
status of test-taker 330–332
statutory control of assessment measures,
SA exercise of control 127–136
fair and ethical practices 136–139
reasons for 126
sten scale 54, 54
stringency bias 65
Strong Interest Inventory 274
structural equivalence 113 structured
methods 222 subgroup norming 57
subjective formats 88 subjective well-
being 202–203 substance use
disorders 208 successive test
development 120–121 systematic
errors 41–42

systems theory framework 249–250

T
teacher-developed measures 291 Technical
Test Battery 273 terminological/factual-
driven adaptations 102 test adaptations 102,
102–103, 103 test bias 6

Test Commission of the Republic of South Africa 132–


133 test-development teams 83–85
aim, target population, nature of measure
85 content of the measure 85–87 core and
expert 85, 85
and test plans 85
test formats, items 88
testing measurement equivalence
114 testing mode 87
test manuals 5, 96–97
test plan specifications 89
test quality 23
test-retest reliability 61–62
test review systems 23
test sophistication 153
test-teach-test approach 293
test-wiseness 325–327
Thematic Apperception Test 236–
237 theory-driven adaptations 102
thinking vs feeling individuals 230
third-party systems 257 360-degree
competency 275
time limits 88–89
Token Test 195
Trail-making Test 146
TRAM 175, 273
transient conditions 318
traumatic brain injury 316–317
Trends in International Mathematics and Science Study
290 true score 61
type of response labels vs unlabelled response categories
46 typical performance 273

U
unconscious misrepresentation 65
unitary validity 77
urbanisation 325
Utrecht Work Engagement Scale 219

V
validity
concurrent 303
construct 72, 303, 332–333
content 71–72, 303
convergent 73
criterion-prediction 73
differential 74
and interpretation 301–304
validity coefficient 77
factors affecting 77–78
magnitude of 77
validity generalisation 75
validity of a measure 69–70
defining 70–71
indices, interpretation of 77–79
types of 71–77
validity triangle 71, 71
Value Orientations 274
values 247
and career-counselling assessment 247
competing 140, 140–141, 141–142, 296–297
variability, measure of 49–50
variance 49, 49
variances, predictor/criterion variables 78,
78 variations
in assessment conditions and reliability
66 in instructions and reliability 66
in interpretation of instructions and reliability
66 in scoring or ratings and reliability 66
vendor-neutral assessment portals
257 Verbal Comprehension Index 180,
181 Verify Range of Ability Tests 274
see also SHL Verify ability tests
vignettes 278
Vineland Adaptive Behaviour Scales 282–
283 virtual content 261
virtual reality/augmented reality 257
vocational interests 245–246

W
Wechsler Adult Intelligence Scale 180, 181–182. 183, 186,
340 Wechsler Intelligence Scales 17–18, 21
for children 21
well-being 200–201
defining 201
domains of 202–205
and positive psychology 201–202
social, economic determinants 203
subjective 202–203
well-being in the workplace 205, 208–
210 cost of ill health 206–208
wellness programmes 210–211
why it matters 205–206
well-being, measures of 213
in diverse contexts 213–218
in the work context 218–219
well-being, university students 211–
213 domains of holistic view of 212
Wellness Questionnaire for Higher Education 217, 338
workaholism 210
work engagement 209–210
Working Memory Index 180, 182
Work Locus of Control Scale 219
Work-related Risk and Integrity Scales 238–239

Y
young children 162–163

Z
z-scores 53, 55

You might also like