Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

HPS431/715

Psychological Assessment
Seminar 2: Understanding Test Norms and Reliability
Seminar Objectives

1. Complete group administration of a verbal memory test to demonstrate


the use of normative data
• The Rey Auditory Verbal Learning Test (RAVLT)
2. Understand how to access and use normative data
3. Understand how to calculate z-scores and use the z-scores to interpret an
individual client’s test performance relative to the normative population
4. Examine how the reliability of a test can influence interpretation of an
individual client’s test scores
Part A
Psychological Testing: An example of a commonly used memory test
Memory functioning: Some Key Terms
Attention Stimuli

Encoding – Process of Memory Storage

Retrieval – Process of Remembering

Free Recall Cued Recall Recognition


Retrieval without the Retrieval with the Stimulus triggers
aid of cues aid of cues remembering
Rey Auditory Verbal Learning Test (RAVLT)
• Group Administration of the RAVLT
• This is an auditory memory test, usually administered to an individual client to help understand
cognitive strengths and weaknesses
• It allows the clinician to objectively assess how much information an individual client can learn
(encoding processes) and remember (retrieval processes) after a 20-minute delay.

Instructions
• After I finish reading out the list of words, please write down as many words as you can
remember in any order that you like (no speaking)
• The test has multiple trials, such that we will repeat this process several times - use separate
pieces of paper / new word document so that you can’t see earlier lists
Let’s Practice the RAVLT..

• We will now administer the first two trials of the RAVLT (the learning trials)

Remember:
• Write down as many words as you can remember AFTER the full
list is read
• You can write down the words in any order
What does the full RAVLT involve?

• Trials 1 to 5 are the same list of words read 5 times (List A)


• Trial 6 is a new list of distractor words (List B)
• Trial 7 is the original list without being re-read (post-distraction recall)
• Trial 8 is a delayed recall of the first list (approximately 20 to 30 minutes after
trial 7)
• Trial 9 is a recognition recall of the first list (e.g., read a long list of words and
the client responds “yes” or “no” as to whether the word was on the first list)
List A 1-5 (all List A) List B 6 (B) 7 (A) 8 (A after 20 mins)
Tower Engine
Dog Roof
Mother Wave
Gun Tree
Flower Battle
College Ring
Rainbow Donut
Tea Elephant
Uncle Triangle
Lake Swamp
Sunshine Apple
Friend Bank
Circle Table
Chair Lipstick
Shirt Glass
Tower (A) Home Ring (B) Swamp (B) Bank (B)
Window Wave (B) Tea (A) Hot Stocking
Dog (A) Flower (A) Flower Sunshine (A) Table (B)
Barn Garden Uncle (A) Water Chair (A)
Engine (B) Balloon Donut (B) Farmer Lipstick (B)
Mother (A) Tree (B) Elephant (B) Rose Nest
Weather Battle (B) Crayon Apple (B) Children
Gun (A) College (A) Triangle (B) Friend (A) Shirt (A)
Hand Mouse Lake (A) Stranger Toffee
Roof (B) Rainbow (A) Fountain Circle (A) Glass (B)

(A) Words from list A; (B) words from list B


Part B
Now we have a client’s memory test score – what are we going to do with it?
How do you infer meaning from our client’s test score?
We use normative data! Let’s break this down further…..

• The RAVLT is what we call a norm-referenced test

• Norm-referenced testing is a way of deriving meaning from test scores by


comparing an individual client’s score to scores of a comparison group of test-
takers.

• Norms are the test performance data of a particular group of test-takers that
are used as a reference group when interpreting an individual client’s test
scores.
Where do we find normative data?
• Test manuals from widely used, gold-
standard tests (e.g. Wechsler Scales)

• Peer-reviewed journal articles based on


research into less commonly used tests
(e.g. RAVLT)

• Norms handbooks/compendiums,
which draw together norms from a
range of different tests (e.g. memory,
executive function, visuospatial skills).
e.g. Strauss’s Compendium of
Neuropsychological tests
How is normative data presented?
How is normative data presented?
Any comments about these norms?
Part C
Case study:
72 year old male presenting with memory problems. How does our client “measure up”
against the RAVLT test norms?
ANY COMMENTS ABOUT THESE NORMS?

Case study:
72 year old male is
administered the
RAVLT.
His total score = 20.
Interpret his score
using:
• Schmitt (1996) norms
• Geffen (1995) norms
Evaluating our client’s performance with norms
• Is there an easier way to use the information?
• Convert to standardised scores
• E.g. to convert to a z-score:
Observed score – mean of normative sample
sd of normative sample

• Therefore for our patient who scored 20:


• What is their z score using Schmitt (1996) norms?
• What is their z score using Geffen (1996) norms?

Observed Score Mean Normative Sample SD Normative Sample Z-score?


Schmitt (1996) 20 37.1 7.5

Geffen (1995) 20 32.6 8.3


Evaluating our client’s performance with norms
• Is there an easier way to use the information?
• Convert to standardised scores
• E.g. to convert to a z-score:
Observed score – mean of normative sample
sd of normative sample

• Therefore for our patient who scored 20:


• What is their z score using Schmitt (1996) norms?
• What is their z score using Geffen (1996) norms?

Observed Score Mean Normative Sample SD Normative Sample Z-score?


Schmitt (1996) 20 37.1 7.5 20 – 37.1 = -2.28
7.5
Geffen (1995) 20 32.6 8.3 20 – 32.6 = -1.52
8.3
Evaluating our client’s performance with norms
Schmitt (1996) z-score = −2.28 Geffen (1995) z-score = −1.52

What do these z-scores tell us?


So which norms do you use?

• Any initial thoughts?


• What information do you want to know before making your decision?
So which norms do you use?

• Selecting the most appropriate norms requires good knowledge of the


research and publications associated with the test
• Legitimate interpretation also demands careful consideration of the
procedures and standardisation sample that produced the norms
• Understanding these issues reflects one aspect of competence in testing
(more on this in Week 4 when we discuss the APS Ethical Code)
Part D
Case study:
How does the reliability coefficient of the test impact our interpretation of our client’s
memory test score?
What about the reliability of the test?

• How would your interpretations change if the test was unreliable?

• How can we tell how reliable our client’s total score of 20 on the
RAVLT actually is?
Let’s calculate the 95% CI for the client’s observed score
To do this, we first need to calculate the Standard Error of Measurement (SEM):

Sx or σ = standard deviation of sample

rxx = reliability coefficient

Where do we find these? What


values do we use?

Then we use the SEM to calculate our 95% confidence interval


95%𝐶𝐼 of test score = test score ± 1.96(𝑆𝐸𝑀)
rxx σ SEM: Test
Score
σ√1 – rxx (TS)

Schmitt
1996
Geffen
1995

Schmitt
1996
Geffen
1995
rxx σ SEM: Test
Score
σ√1 – rxx (TS)

Schmitt 7.5
1996
Geffen 8.3
1995

Schmitt 7.5
1996
Geffen 8.3
1995
rxx σ SEM: Test
Score
σ√1 – rxx (TS)

Schmitt .89 7.5


1996
Geffen .89 8.3
1995

Schmitt .42 7.5


1996
Geffen .42 8.3
1995
When reliability coefficient of test is high

rxx σ SEM: Test


Score
σ√1 – rxx (TS)

Schmitt .89 7.5


1996
Geffen .89 8.3
1995

Schmitt .42 7.5


1996
Geffen .42 8.3
1995

When reliability coefficient of test is low


When reliability coefficient of test is high

rxx σ SEM: Test


Score
σ√1 – rxx (TS)

Schmitt .89 7.5 20


1996
Geffen .89 8.3 20
1995

Schmitt .42 7.5 20


1996
Geffen .42 8.3 20
1995

When reliability coefficient of test is low


When reliability coefficient of test is high

rxx σ SEM: Test


Score TS – 1.96(SEM) = TS + 1.96(SEM) =
σ√1 – rxx (TS) Lower 95% CI Upper 95% CI

Schmitt .89 7.5 7.5√1-.89 = 2.49 20


1996
Geffen .89 8.3 8.3√1-.89 = 2.75 20
1995

Schmitt .42 7.5 7.5√1-.42 = 5.71 20


1996
Geffen .42 8.3 8.3√1-.42 = 6.32 20
1995

When reliability coefficient of test is low


When reliability coefficient of test is high

rxx σ SEM: Test


Score TS – 1.96(SEM) = TS + 1.96(SEM) =
σ√1 – rxx (TS) Lower 95% CI Upper 95% CI

Schmitt .89 7.5 7.5√1-.89 = 2.49 20 20 – 1.96 (2.49) = 15.1 20 + 1.96 (2.49) = 24.8
1996
Geffen .89 8.3 8.3√1-.89 = 2.75 20 20 – 1.96 (2.75) = 14.6 20 + 1.96 (2.75) = 25.4
1995

Schmitt .42 7.5 7.5√1-.42 = 5.71 20 20 – 1.96 (5.71) = 8.8 20 + 1.96 (5.71) = 31.2
1996
Geffen .42 8.3 8.3√1-.42 = 6.32 20 20 – 1.96 (6.32) = 7.6 20 + 1.96 (6.32) = 32.4
1995

When reliability coefficient of test is low


This gives us our observed score with 95% CI
rxx σ SEM TS [95% CI] Z score (observed) Z score (lower CI) Z score (upper CI)

Schmitt .89 7.5 2.49 20 [15.1, 24.8]


1996
Geffen .89 8.3 2.75 20 [14.6, 25.4]
1995

Schmitt .42 7.5 5.71 20 [8.8, 31.2]


1996
Geffen .42 8.3 6.32 20 [7.6, 32.4]
1995

We can convert the confidence interval into z scores using the same method
787̅
we used previously: i.e.,
:
This gives us our observed score with 95% CI
rxx σ SEM TS [95% CI] Z score (observed) Z score (lower CI) Z score (upper CI)

Schmitt .89 7.5 2.49 20 [15.1, 24.8] 20 – 37.1 = -2.28 15.1 – 37.1 = -2.93 24.8 – 37.1 = -1.64
1996 7.5 7.5 7.5

Geffen .89 8.3 2.75 20 [14.6, 25.4] 20 – 32.6 = -1.52 14.6 – 32.6 = -2.17 25.4 – 32.6 = -0.87
1995 8.3 8.3 8.3

Schmitt .42 7.5 5.71 20 [8.8, 31.2] 20 – 37.1 = -2.28 8.8 – 37.1 = -3.77 31.2 – 37.1 = -0.79
1996 7.5 7.5 7.5

Geffen .42 8.3 6.32 20 [7.6, 32.4] 20 – 32.6 = -1.52 7.6 – 32.6 = -3.01 32.4 – 32.6 = -0.02
1995 8.3 8.3 8.3

We can convert the confidence interval into z scores using the same method
787̅
we used previously: i.e.,
:
• Now putting it into one nice table we have both the
unstandardized and (z) standardised results
rxx σ SEM (Raw/Unstandardised) (Standardised)

Schmitt 1996 .89 7.5 2.49 20 [15.1, 24.8] -2.28 [-2.93, -1.64]
Geffen 1995 .89 8.3 2.75 20 [14.6, 25.4] -1.52 [-2.17, -0.87]

Schmitt 1996 .42 7.5 5.71 20 [8.8, 31.2] -2.28 [-3.77, -0.79]

Geffen 1995 .42 8.3 6.32 20 [7.6, 32.4] -1.52 [-3.01, -0.02]

Does your
interpretation
change?
rxx σ SEM (Raw/Unstandardised) (Standardised)

Schmitt 1996 .89 7.5 2.49 20 [15.1, 24.8] -2.28 [-2.93, -1.64]
Geffen 1995 .89 8.3 2.75 20 [14.6, 25.4] -1.52 [-2.17, -0.87]

Schmitt 1996 .42 7.5 5.71 20 [8.8, 31.2] -2.28 [-3.77, -0.79]

Geffen 1995 .42 8.3 6.32 20 [7.6, 32.4] -1.52 [-3.01, -0.02]

Does your
interpretation
change?
rxx σ SEM (Raw/Unstandardised) (Standardised)

Schmitt 1996 .89 7.5 2.49 20 [15.1, 24.8] -2.28 [-2.93, -1.64]
Geffen 1995 .89 8.3 2.75 20 [14.6, 25.4] -1.52 [-2.17, -0.87]

Schmitt 1996 .42 7.5 5.71 20 [8.8, 31.2] -2.28 [-3.77, -0.79]

Geffen 1995 .42 8.3 6.32 20 [7.6, 32.4] -1.52 [-3.01, -0.02]

Does your
interpretation
change?
rxx σ SEM (Raw/Unstandardised) (Standardised)

Schmitt 1996 .89 7.5 2.49 20 [15.1, 24.8] -2.28 [-2.93, -1.64]
Geffen 1995 .89 8.3 2.75 20 [14.6, 25.4] -1.52 [-2.17, -0.87]

Schmitt 1996 .42 7.5 5.71 20 [8.8, 31.2] -2.28 [-3.77, -0.79]

Geffen 1995 .42 8.3 6.32 20 [7.6, 32.4] -1.52 [-3.01, -0.02]

Does your
interpretation
change?
Take home messages
• Clinicians must always consider the appropriateness of the norms
being used to interpret their client’s individual performance on a
given test
• It is critical to use reliable tests, otherwise a client’s observed score
may not be a accurate reflection of the their theoretical ‘true’
score (and thus the test is of little use)

You might also like