Professional Documents
Culture Documents
PSYC 3200 - Tests & Measurement
PSYC 3200 - Tests & Measurement
report items)
↳ 19th century •
Woodworth data sheet
personal
* Germany * France •
MMPI
*
England * North America * projective
↳ individual differences •
Rorschach inkblot test
* Galton
↳ experimental
* Fechner * Wundt ~
psych . lab , founder
↳ mental measurement
* Francis Galton
inheritance of
genius (e. g. reaction times)
•
•
•
co -
relations
believed in
eugenics
•
* worked w/ Galton
* mental measurement
↳ Alfred Binet
instruction
statistics review •
flat distribution ~
Platykurtic
→ of
scales measurement •
high peak -
Leptokurtic
↳ nominal •
normal ~ Mesokurtic
* not numerical ,
numbers are labels
↳ ordinal
* rank
ordering
↳ interval
↳ ratio
→
frequency distributions
→
transformations
↳ ↳
displays scores
showing how often each
percentiles
→ ↳ 2- score
describing distributions -
Stan . deviation
↳ measures of central tendency * how far away from the mean the score is
* median ~
middle score ↳ t -
score
↳ measures * stddev 10
of
variability
=
.
*
range
-
highest score -
lowest score ↳ standard scores ( IQ scores)
2
I. ( X -
5)
* variance ~ OE n (I =
mean ) * mean = too
* standard deviation ~
represents the
average * Std dev = 15
É
o= n
↳ determines how many items in a data set
↳
measures of
symmetry and
"
peaked ness
" fall below a given point
* skewness e.
g. placed 62/63 runners in a race .
•
negative - tail pointed f) , high observation Pr = ¥ ✗
100 =
percentile rank of Xi
→
percentiles * types of correlation coefficients
>
* percentile deals in raw score units relationship between 2 continuous variables
fourths >
relationship between a continuous and an
°
scores between the 25th and 75th percentiles true dichotomous -
naturally from 2
Categories
underlying continuous
individuals on a
particular test •
slope ( y=mx + b)
↳ reference
used as a
group , to which scores •
intercept
relative position in a
population * linear relationships
→
Group based morning * regression to the mean
↳ aka :
demographically corrected norms *
range restriction -
restricted variability
↳ correlation ~
measure of the linear
'
→ coefficient of determination -
r2 which
from -1 to
'
* values +1 → coefficient of alienation 1 r2
range which
-
~
•
influenced by N accounted for
'→
multiple regression class 3 , week 3
* can have
any number of predictors reliability
* coefficient →
reliability the consistency of
each
regression represents scores
-
unique contribution of IV in
predicting DV Obtained
by the same
persons
↳ the first of " "
test
* more often see standardized
regression requirement a
good
↳
coefficients as well as an standardized consistent and replicable
↳
* capitalization on chance variability
* * between
shrinkage people
*
randomly splitting samples → classical test theory
↳ factor analysis ↳
goals
* "new "
be ↳ measurement
"
noise "
can the
resulting variables error
sample
to the data
* adds variability but does not
affect
average group performance
→ sources of error
↳ content
heterogeneity
↳ time
sampling
↳
testing / experimental conditions
↳ oriented conditions
person -
→
estimating reliability ↳ split half
-
reliability
* of relationship between two sets sets calculate correlation between two sets
r=
degree and
'→
of scores
homogeneity / internal consistency
reliability
consistency -
and error scores on obtained test scores the degree to which items on a particular
*
range of values : 0 -1 (only positive) measure relate to each other
0-2-1 0-2-1
rxx =
ozx r ✗✗ = 0-2-1 + oze ⑥ equivalent to
average of all possible split -
→ half
methods of
assessing reliability estimates
↳ alternate forms
/ parallel tests (rformaformz) •
estimate
using formula that
provides the
* statistical considerations for parallel tests • KR -20 used for dichotomous items only (e. g. IT / F)
•
have the same standard deviation Cronbach 's alpha is of
• a more
general case
•
correlate with the same set of true scores KR -20 and can be used for dichotomous
•
error is
truly random items or
continuously scaled items (e. g. ,
Likert type)
-
'
* forms are correlated → inter -
reliability (rraterseraterz)
•
expensive phenomenon (e. g. , panel interview)
•
difficult to create equivalent forms that •
if
ratings are continuous ,
correlation between
ratings
•
reactivity -
experience may influence results • rater - ratee effects can be difficult to overcome
•
time frame ( too far apart / too close together)
→ can measurement error be reduced ? → standard SEM = S
er ror of measurement 2- rxx .
↳ to reduce ↳ the
ways error
degree to which an individual's scores would
* test development vary if they were to take the same test numerous times
•
ensuring wording is clear ↳ the Std der . .
of a theoretically normal distribution
•
test environment variables ↳ can be used to estimate the that
range an
•
consistency individual 's
"
true score
"
would fall within , given a
•
ensure scorers are well trained ↳ confidence interval ( CI )
→
increasing reliability observed score where their true score is likely
→
↳ removal of
problematic items relationship between reliability and validity
→ N and reflects
r effects on rxx score what you are
trying to measure
→ ↳
Using knowledge of error high reliability necessary when:
is "
true " rather than "
error
"
* preliminary rather than final decisions
•
allows for the estimation of the of * > 80 90 decisions
degree about people
- ~
. .
test score and their actual true score * . 50 ~ true scores and er ror have equal effects
on test scores
Class 4 ,
week 4 * test homogeneity
validity measures
single construct
•
a
→
validity ~ denotes the scientific
utility of •
Cronbach 's alpha , item -
total correlation
a
measuring instrument , broadly stat able * convergent validity
in terms of how well it measures what •
test scores correlate with scores
↳
reliability established measure similar or related
* consistency constructs
* easy to access
•
test scores do not correlate with
↳
validity scores on other measures they ought
* to not to
speaks whether what is
being correlate with ,
as
predicted
is from theory
measured really being measured
* must
"
build a case
"
to assert validity * changes over time or with
age
and places an
upper limit on validity over time or as a function of age
→ main
types of
validity as
predicted
↳
construct validity ~ is a
judgement * contrasted or distinct
groups validity
idea developed to
explain behaviour •
test scores obtained at time 1 and time
↳
in the test development process represents multiple traits and
* criterion related ↳
validity ~
validity that is six guidelines for
writing test items
•
predictive ~ occurs when we have a * generate pool of items
measure that 's used to predict * avoid items that are exceptionally long
performance on some criterion measure * be aware of the reading level of those taking
→ decision ↳ this
validity and theory approach offers two choices for each question
↳
in the population who would be expected advantages
↳
* hit rate ~
proportion of all decisions disadvantages
that * promote memorization without
are accurate can
understanding
* test score
cut score ~
determining * many situations are not truly dichotomous
limited ↳ ID
* on a test item with a number of ( p) = % of passing item
↳ what difficulty ?
correctly through simple guessing would be the ideal item -
that
go together can be identified and low scorers on an entire test
→ the
category format →
extreme groups method
"
↳ scale of one to ten . . .
"
↳ D= H -
L
↳ controversy * 1-1=-1 of
high scorers answered correct
.
* factors that 1-
ratings can be affected
by * L of low scorers answered correct
-
=
→ item
* context can change the way one responds discrimination
→ tests and
criterion referenced mastery * correlation between the score on an individual
* individuals '
scores are important insofar as ↳ after examining difficulty and discrim inability
they predict the criterion look at the number of times each distractor
↳
mastery testing
-
↳ item ↳
validity provide information about how an item relates
* bi serial
point to the total test across performance levels
* compare item -
criterion correlations with ↳ a complex method for assessing item performance
inter item correlations ↳ actual performance compared to
-
is expected
performance
the Wechsler scales ~ week 8 ↳ 30 items
galton (1822-1911)
" "
→ Sir francis ↳ Idiot ,
"
Imbecile" , and
"
moron
"
↳
genius ( 1869)
"
↳ hereditary normed on 50 normal " children
↳ founder of → Stanford-Binet
individual psychology intelligence scale
↳ ↳ lQ= MA / CA
eugenics ✗ 100
spearman (1863-1945)
→ * measures
Chaires fluid reasoning , knowledge , quantitative
}
realization of
>
g general intelligence * reflect learning the potential
~
,
,,
most established
predictor of
>s ~
specific intelligence performance
"
through experience
↳ factor ↳ fluid
analysis -
analytic abilities
→
raymond B . Cattell (1905-1998) * represents
original potential , the ability to
to
↳ WAIS R ↳ WAIS ☒
purposefully ,
think rationally , and deal - -
(current)
↳ moved
effectively with the environment away from a
single score indication
↳ intelligence
intelligence is an aspect of
personality , of
Stanford-Binet ↳
→ scale Important differences
children in the Paris school system > Binet scale grouped items by age level , and
scattered throughout the test ↳ each subtest produces a raw score based
,
> nonverbal
examined
intelligence in a
way * Std . dev .
=3
not addressed by early Binet scales ↳ index scores are calculated by combining the index
→ WAIS -
II subtests verbal IQ score
↳
perceptual organization
↳
working memory
→ WAIS II indexes ↳
processing speed
-
information → FSIQ
picture completion
block design
perceptual organization matrix reasoning
arithmetic
digit span
working memory
letter number
sequencing
✓ digit symbol -
coding
processing speed -
symbol search
→ of WAIS II ↳
interpretation -
applications
↳ Index * *
score comparisons psycho educational neuropsychological
-
lateralized deficit ,
learning disability ,
or
↳
pattern analysis
↳
psychometric properties
* standardization
specialty groups
* reliability
>
high reliability estimates for both internal
*
validity
IQ
testing
↳ extensions
* WISC -
I ( 1949) -
ages 6- 16
* WPPSI -
☒ ( 1967) -
ages 2.5 -
Ty 7m
cerebral cortex
→ what clinical
is
neuropsychology
lobe
and brain
functioning in
cognitive ,
motor
,
thalamus
information
* provides in how well the brain *
primary sensory cortex
is
functioning in order to complete various * mathematical computations
> >
diagnosis return to work * primary visual cortex
>
legal >
driving →
developmental neuropsychology
>
post -
>
assessment of cognitive decline * behavioural issues
> assessment of
competency
↳ types of tests for children
neuropsychological →
neuropsychological assessment
functions * hx
psychosocial
them
to services to help overcome such *
personality
→
neuropsychological deficits (D= absence of )
. . .
> lateralized
Signs > motor
↳ Acalculia ~
inability to perform arithmetic >
sensory
>
thought processes
>
calculations >
language memory
↳ Agnosia ~
deficit in
recognizing sensory
>
attention
>
executive functioning
faces ↳
*
prosopagnosia
- unable to distinguish neuropsychological testing
↳ Alexia ~
inability to read * fixed battery approach
absence of paralysis
•
tactual performance test •
WAIS
↳ MMPI
conditions that neuropsychological rhythm test
• •
can cause
deficits •
speech sounds perception test
>
* head
injury * seizure disorder frequently added
•
* stroke / CVA * psychiatric disorders trail test hand dynamometer
making
•
* dementia * ADHD •
grooved pegboard
•
sensory perceptual exam
>
* other neurotoxins and more core set of tests that are common across
or subtracted as needed
> luria -
nebraska battery testing in healthcare 2 ~
week 10
•
tactile • visual ↳ Intellectual
•
memory
•
reading * WAIS * Kaufman
•
receptive speech •
expressive speech * Stanford-Binet
•
intellectual processes • arithmetic skills ↳ achievement
↳ motor ↳ visual
spatial * finger oscillation * grip strength
↳
language ↳ memory *
grooved pegboard
↳
learning ↳
↳ emotional
sensory
↳ executive *
judgement of line orientation
functioning
↳ performance * clock
validity drawing test
* test of * block
memory malingering design
15 task
* rey item test * hooper visual
organization
-
↳ symptom validity ↳
language
* assessment * vocabulary * token test
personality inventory
* structured inventory of malingered symptoms * multilingual aphasia exam * boston naming test
* PPVT (non -
verbal vocab test)
↳
memory
*
recognition memory test ( RMT)
* Wechsler
memory scale -
II ( Wms -
II )
↳
learning ↳ emotional
* hopkins verbal
learning test -
* *
bushke selective reminding test
geriatric depression scale single construct measures
→ stress
* digit span and anxiety
than just totalling right and wrong answers * estimated to be involved in 50-1 .
to 80.1 .
of illnesses
Alzheimer's ↳ anxiety
or
Huntington's ,
as well as is an emotional state marked by
anxiety inventory
'
* category test ( problem solving test) ↳ the STAI produces separate scores for each
↳
* Wisconsin card sorting test promising psychometrics
* trails B →
ecological momentary assessment
*
personality assessment inventory
↳ ecological momentary assessment ( EMA) can computers and basic psychological science in
* occurs in nature environment and leads to ↳ the rationale for cognitive - behavioural assessment
approach cause
↳ non proprietary , publicly available thoughts , and physiological processes as the problem
↳ 3 to 85 of
years age * more direct than traditional
psychological tests
* traditional us .
Cognitive - behavioural assessment
traditional cognitive -
behavioural
focus of treatment
symptoms superficial
indirect ; not related direct ; related to
is ↳
* premature mortality not desirable early procedures based on operant conditioning
→ health -
related quality of life those behaviours in the future
↳ WHO : "
health is a complete state of * multiple steps :
>
physical mental , social
,
and well -
"
and not merely the absence of disease >
employ interventions to increase or decrease the behaviour
↳ two
major approaches as needed
* psychometric
> determine what change in behaviour has occurred and
* decision
theory adjust intervention
* used for a variety of problems -
examples include * assertiveness
>
habits , addiction , and diet ability appropriately stand for
smoking , study poor the to or speak up
>
② identify critical behaviours distinguished from aggressiveness ( e. g. , temper tantrum)
intensity (i. e. , obtain a baseline situation that would typically call for as assertive response
procedures
>
or intensity of behaviours ; if deficits , attempt to psychometric data on self -
report instrument have
↳ self >
of
-
report techniques early pencil -
and -
paper tests are very reminiscent
procedures
>
always possible typically have problems with face validity and all of its
* self report -
techniques involve
considering a list of various complications
>
statements about a given situation -
often true / false very few of them have been subjected to adequate ,
designed research
>
traditional method : a
person is always fear ful * schemas -
cognitive frameworks that guide our
>
cognitive -
behavioural method : a
person is fear ful knowledge ,
beliefs , and actions
>
only in certain circumstances negative ( dysfunctional ) schemas underlie
pathological
>
different versions for different and
age groups
cross -
cultural studies have been done
fear
responses
↳ irrational beliefs test →
psychophysiological procedures
* in a same manner , the irrational or unrealistic ↳ physiological variables with treatment implications
* the Irrational Belief Test ( Jones 1968) , was * some examples of such measurement
100 -
item scale • blood pressure • heart rate
*
agree or
disagree with statements
using a •
galvanic skin response ( GSR)
↳ evaluation of
* adequate psychometrics ,
and was initially psychophysiological testing
5 * what
* uses a 5- point scale and contains subscales changes are considered significant?
↳ two
a variety of uses in clinical settings basic ways to use computers in
testing
↳
cognitive functional analysis * administer, score , and interpret tests
* the basis of cognitive functional analysis is * create new tasks to test abilities that traditional
> environmental
antecedents internal ↳ its did not it
>
dialogue creator expect to work
>
environmental consequences
> treatment of
in a manner equal or more valid than a pencil phobias
and >
paper response exposure -
based interventions
↳
same as a written version ,
what accounts for computer adaptive testing
>
standardized questions * time and cost reduction benefits
> lack of social desirability effect * not suitable for types of situations
all testing
> for
reduces embarrassment delicate topics
↳ tests
computer -
administered
evaluations as pencil -
paper formats
* benefits
consuming
> cost -
effective
↳ and of results
computer diagnosis , scoring , reporting
to that of humans
↳ internet
usage for
psychological testing
for
* internet is a
thriving source psychological
→
why is test bias controversial ? > educational testing service (ETS)
* environmental factors
→ the traditional of
defense testing
↳ differential validity
different groups ,
is it really valuable ?
are considered
↳ content -
related evidence for validity
* items on
intelligence tests may unfairly favour
or hinder certain
groups
* Flaugher ( 1978) -
trends in scores
"
* (2016) found "
Drasgrow et al . that purifying tests