PSYC 3200

PSYC 3200
Tests & Measurement

Class 1. week 1 ↳ WWI
tests and measurement * army alpha test -

literate
→ historical overview * beta test illiterate

army
-
↳ ancient ( 2200 BCE) ↳

China personality testing
* civil service exams * structured (self -
report items)
↳ 19th century •
Woodworth data sheet
personal
* Germany * France •
MMPI
*
England * North America * projective
↳ individual differences •
Rorschach inkblot test
* Darwin * Cattell ~ mental test
* Galton
↳ experimental
* Fechner * Wundt ~
psych . lab , founder
↳ mental measurement
* Francis Galton
inheritance of
genius (e. g. reaction times)
•
•
•
co -
relations
believed in
eugenics
•
↳ James McKeen Cattell
* studied under Wundt
* worked w/ Galton
* brought their ideas +

methodology to U.S .
* mental measurement
↳ Alfred Binet
* commissioned French to develop

by gov .
a test to determine if a child would
benefit from a standard classroom
instruction
* the first 1905

developed intelligence test in
Class 2
,
week 2 *
peaked ness is measured by kurtosis
statistics review •
flat distribution ~
Platykurtic
→ of
scales measurement •
high peak -
Leptokurtic
↳ nominal •
normal ~ Mesokurtic
* not numerical ,
numbers are labels
↳ ordinal
* rank
ordering
↳ interval
* equal intervals , no absolute zero
↳ ratio
* equal intervals with an absolute zero
→
frequency distributions
→
transformations
↳ ↳
displays scores
showing how often each
percentiles
one occurs in a data set
( observed score - mean )
→ ↳ 2- score
describing distributions -
Stan . deviation
↳ measures of central tendency * how far away from the mean the score is
* mean standard deviation units

average in
~
* median ~
middle score ↳ t -
score
* mode ~ most common score * mean = 50
↳ measures * stddev 10
of
variability
=
.
*
range
-
highest score -
lowest score ↳ standard scores ( IQ scores)
2
I. ( X -
5)
* variance ~ OE n (I =
mean ) * mean = too
* standard deviation ~
represents the
average * Std dev = 15
amount of from the →

deviation mean percentile ranks
É
o= n
↳ determines how many items in a data set
↳
measures of
symmetry and
"
peaked ness
" fall below a given point
* skewness e.
g. placed 62/63 runners in a race .
pointed (t ) , low observation 1/63=-016

•
positive ~
the tail is percentile rank = ✗ 100=1.6 percentile
•
negative - tail pointed f) , high observation Pr = ¥ ✗
100 =
percentile rank of Xi
→
percentiles * types of correlation coefficients
↳ similar to percentile rank •

Pearson Product Moment CPPM)
>
* percentile deals in raw score units relationship between 2 continuous variables
* is a measure of relative performance •

Spearman's Rho ( p)
→ deciles > 2
quartiles and sets of ordinal data
↳ divide the distribution into equal Bi serial

quartiles
•
fourths >
relationship between a continuous and an
* inter is the interval of orificial

quartile range dichotomous variable
°
scores between the 25th and 75th percentiles true dichotomous -
naturally from 2
Categories
represents 50.1 of distribution °

the middle . artificial dichotomous -
underlying continuous
* deciles divide the distribution into 10 equal scale forced into a

dichotomy
↳
groups regression
→ normative
samples *
regression line
↳ obtained from sample of

scores a * regression equation
individuals on a
particular test •
slope ( y=mx + b)
↳ reference
used as a
group , to which scores •
intercept
can be compared to assess examinees * prediction
relative position in a
population * linear relationships
→
Group based morning * regression to the mean
↳ aka :
demographically corrected norms *
range restriction -
restricted variability
↳ somewhat controversial decreases the likelihood of

finding significant
→ correlation and regression correlations
↳ correlation ~
measure of the linear
'
→ coefficient of determination -
r2 which
association between two variables reflects the amount of variance in the
* scatter plot for

graphed by criterion that is accounted by the predictor
from -1 to
'
* values +1 → coefficient of alienation 1 r2
range which
-
~
* significance testing represents the amount of variance not
•
influenced by N accounted for
'→
multiple regression class 3 , week 3
* can have
any number of predictors reliability
* coefficient →
reliability the consistency of
each
regression represents scores
-
unique contribution of IV in
predicting DV Obtained
by the same
persons
↳ the first of " "
test
* more often see standardized
regression requirement a
good
↳
coefficients as well as an standardized consistent and replicable
coefficients * two measurements should lead to approx .
↳ cross validation the same results
↳
* capitalization on chance variability
* * between
shrinkage people
* independent * with themselves

replications
*
randomly splitting samples → classical test theory
↳ factor analysis ↳
goals
* can a set of measures be reduced to * estimate errors in measurement
a smaller set ! * improve tests to minimize errors
* "new "
be ↳ measurement
"
noise "
can the
resulting variables error
interpreted? * caused by factors that randomly influence
* role in measurement the measurement of a variable across the
sample
* increase decrease scores

may or
* not " "

to the
necessarily random individual
to the data
* adds variability but does not
affect
average group performance
→ sources of error
↳ error due to the instrument
↳ content
heterogeneity
↳ time
sampling
↳
testing / experimental conditions
↳ oriented conditions
person -
→
estimating reliability ↳ split half
-
reliability
↳ correlation coefficient * divided into two

randomly measures items
* of relationship between two sets sets calculate correlation between two sets
r=
degree and
'→
of scores
homogeneity / internal consistency
↳ (rxx) * split half

reliability coefficient -
reliability
* index of measurement consistency * Cronbach 's Alpha / KR -20
* index of the relative influence of true score • a measure of inter item

-
consistency -
and error scores on obtained test scores the degree to which items on a particular
*
range of values : 0 -1 (only positive) measure relate to each other
0-2-1 0-2-1
rxx =
ozx r ✗✗ = 0-2-1 + oze ⑥ equivalent to
average of all possible split -
→ half
methods of
assessing reliability estimates
↳ alternate forms
/ parallel tests (rformaformz) •
estimate
using formula that
provides the
* 2 different versions of the test

average inter -
item correlation for a set of items
* statistical considerations for parallel tests • KR -20 used for dichotomous items only (e. g. IT / F)
•
have the same standard deviation Cronbach 's alpha is of
• a more
general case
•
correlate with the same set of true scores KR -20 and can be used for dichotomous
•
error is
truly random items or
continuously scaled items (e. g. ,
Likert type)
-
'
* forms are correlated → inter -
reliability (rraterseraterz)
* test construction * assesses the

degree to which different raters
* limitations estimates of the

provide similar same
•
expensive phenomenon (e. g. , panel interview)
• time * estimate two ways

consuming
-
•
difficult to create equivalent forms that •
if
ratings are continuous ,
correlation between
contain the same number and type of items raters

'
ratings
reliability (rtimesstimez ) categorical (yes/ no)

↳ test-retest •
calculate
if
ratings are ,
* administer same measure to same sample the % agreement between raters
at two different times •

Kappa statistic ~
indicates the actual
agreement
* correlate first set of scores with second as a

proportion of the
potential agreement
* limitations following correction for chance

agreement
between time two * limitations
•
genuine change one and
2 more raters to be consistent
• hard to get or
•
reactivity -
experience may influence results • rater - ratee effects can be difficult to overcome
•
time frame ( too far apart / too close together)
→ can measurement error be reduced ? → standard SEM = S
er ror of measurement 2- rxx .
↳ to reduce ↳ the
ways error
degree to which an individual's scores would
* test development vary if they were to take the same test numerous times
•
ensuring wording is clear ↳ the Std der . .
of a theoretically normal distribution
pilot test measures of test scores obtained by one person

• on
* test administration equivalent tests
•
test environment variables ↳ can be used to estimate the that
range an
•
consistency individual 's
"
true score
"
would fall within , given a
* test interpretation specific level of confidence

scoring and
•
ensure scorers are well trained ↳ confidence interval ( CI )
ensure data entered

accurately * the
range of scores around an individual's
•
are
→
increasing reliability observed score where their true score is likely
↳ increase the number of items to be
→
↳ removal of
problematic items relationship between reliability and validity
↳ factor ↳ relates to test

analysis validity -
the
degree to which a
→ N and reflects
r effects on rxx score what you are
trying to measure
high reliability high reliability no reliability high reliability

low
validity low validity no validity high validity
→ how reliable should tests be ?
→ ↳
Using knowledge of error high reliability necessary when:
↳ * tests to final decisions

reliability coefficient are used make
* the variance that ↳ lower is

proportion of observed score reliability acceptable when :
is "
true " rather than "
error
"
* preliminary rather than final decisions
* standard error of measurement ( SEM) ↳

general guidelines
•
allows for the estimation of the of * > 80 90 decisions
degree about people
- ~
. .
closeness between an individual 's observed * 7. 70 ~ research purposes
test score and their actual true score * . 50 ~ true scores and er ror have equal effects
on test scores
Class 4 ,
week 4 * test homogeneity
validity measures
single construct
•
a
→
validity ~ denotes the scientific
utility of •
Cronbach 's alpha , item -
total correlation
a
measuring instrument , broadly stat able * convergent validity
in terms of how well it measures what •
test scores correlate with scores
it to predicted from theory

purports measure on other measures , as
→ contrasted w/ reliability • correlate new measure with more
↳
reliability established measure similar or related
* consistency constructs
* precision and dependability * discriminant validity
* easy to access
•
test scores do not correlate with
↳
validity scores on other measures they ought
* to not to
speaks whether what is
being correlate with ,
as
predicted
is from theory
measured really being measured
* must
"
build a case
"
to assert validity * changes over time or with
age
↳ reliability is but not sufficient test scores increase decrease

necessary or
•
and places an
upper limit on validity over time or as a function of age
→ main
types of
validity as
predicted
↳
construct validity ~ is a
judgement * contrasted or distinct
groups validity
about the extent to which test •

scores obtained by of different
a
people
measures a theoretical construct

groups differ as predicted
informed scientific * test / post test

* construct an pre changes
- - -
,
idea developed to
explain behaviour •
test scores obtained at time 1 and time
* construct validation process 2 differ as

theoretically predicted
•
reflects role of
psychological theory * the matrix (multi trait -
multi method matrix)
↳
in the test development process represents multiple traits and
test test matrix

hypotheses about what methods within
•
a
scores should (not) relate to ↳

helps establish convergent and discriminant
validity evidence , and other data

* content validity -
the extent to which a class 6 week 6
test measures the universe of content item writing and analysis
to base tests → item

used assess achievement
writing
•
* criterion related ↳
validity ~
validity that is six guidelines for
writing test items
based on some external criterion measure * define clearly what

you wish to measure
•
predictive ~ occurs when we have a * generate pool of items
measure that 's used to predict * avoid items that are exceptionally long
performance on some criterion measure * be aware of the reading level of those taking
in the future ( using test scores to

predict) the scale and the reading level of the items
concurrent measures the that two ideas

same
point * avoid items at
• more
convey
~
or
in time ( correlating test scores w/ GPA ) the same time
* face validity ~ the

degree to which a * consider using questions that mix positive and
measure appears to measure the traits it negative wording
claims to measure → the dichotomous format
→ decision ↳ this
validity and theory approach offers two choices for each question
↳ test as sit decision * personality tests

in
making appears on educational as well as
* base rate the proportion of people * e.

g. , yes/ no , true / false
~
↳
in the population who would be expected advantages
to perform well * simplicity * often requires absolute judgement
↳
* hit rate ~
proportion of all decisions disadvantages
that * promote memorization without
are accurate can
understanding
* test score
cut score ~
determining * many situations are not truly dichotomous
whether one passes ( at or above) or * 50-1 .

of getting any item right even if
fails ( below) material is not known
→ the polytomous format
↳ similar to dichotomous method ,

but has more
than two options
↳ most common example is a multiple-choice
* incorrect called distractor s

options are
↳ → item
difficulty
guessing index
limited ↳ ID
* on a test item with a number of ( p) = % of passing item
responses , a certain number can be answered ↳ test difficulty =

average item -
difficulty
↳ what difficulty ?
correctly through simple guessing would be the ideal item -
→ the likert format * ideal is 0.5 or , mid -
point between 2.0 and
↳ offers a continuum of responses that allow chance success rate
for measurements of altitudes on various topics → item -

discrimination index (d)
↳ open to factor analysis and of items ↳ how well

, groups an item distinguishes between high
that
go together can be identified and low scorers on an entire test
→ the
category format →
extreme groups method
↳ similar likert but with a number ↳

performance of
to
greater compares high and low scorers on
of choices each item
"
↳ scale of one to ten . . .
"
↳ D= H -
L
↳ controversy * 1-1=-1 of
high scorers answered correct
.
* factors that 1-
ratings can be affected
by * L of low scorers answered correct
-
=
can threaten the validity * d ranges between -1 and 2
→ item
* context can change the way one responds discrimination
↳ visual scales ↳ item total correlation method

analogue
-
→ tests and
criterion referenced mastery * correlation between the score on an individual
↳ criterion referenced testing item and the total test score
* depends the purpose of the test →

analysis of distractor
upon
* individuals '
scores are important insofar as ↳ after examining difficulty and discrim inability
they predict the criterion look at the number of times each distractor
↳
mastery testing
-
mastery of content was chosen by the

high and low scoring groups
→ individual differences → item characteristic curves
↳ item ↳
validity provide information about how an item relates
* bi serial
point to the total test across performance levels
↳ inter item → item

-
correlations response theory
* compare item -
criterion correlations with ↳ a complex method for assessing item performance
inter item correlations ↳ actual performance compared to
-
is expected
performance
the Wechsler scales ~ week 8 ↳ 30 items
galton (1822-1911)
" "
→ Sir francis ↳ Idiot ,
"
Imbecile" , and
"
moron
"
↳
genius ( 1869)
"
↳ hereditary normed on 50 normal " children
↳ founder of → Stanford-Binet
individual psychology intelligence scale
↳ ↳ lQ= MA / CA
eugenics ✗ 100
↳ inventor of fingerprint identification * first used in the 1916 version
↳ half cousin of Charles Darwin ↳ current on the 5th edition
spearman (1863-1945)
→ * measures
Chaires fluid reasoning , knowledge , quantitative
↳ student of Wundt reasoning , visual -
spatial processing , working memory
↳ two factor of ↳ abilities

theory intelligence crystallized
}
realization of
>
g general intelligence * reflect learning the potential
~
,
,,
most established
predictor of
>s ~
specific intelligence performance
"
through experience
↳ factor ↳ fluid
analysis -
analytic abilities
→
raymond B . Cattell (1905-1998) * represents
original potential , the ability to
↳ crystallized and fluid intelligence acquire crystallized abilities
* c- Knowledge overtime → Wechsler intelligence scales
* f- abilities that allow new

knowledge ↳ Wechsler - Bellevue (1939)
↳ the ↳ wats ↳ WAIS III

aggregate , or
global capacity act -
to
↳ WAIS R ↳ WAIS ☒
purposefully ,
think rationally , and deal - -
(current)
↳ moved
effectively with the environment away from a
single score indication
↳ intelligence
intelligence is an aspect of
personality , of
rather than an isolated entity ↳ considered the role of "

non intellective
"
factors
→ Alfred Binet (1857-1911) ↳

challenged Binet scale inappropriateness for
↳ Binet - Simon scale use with adults
↳ Stanford-Binet scale → Wechsler vs . Stanford-Binet
Stanford-Binet ↳
→ scale Important differences
↳ established in order to identify mentally disabled * point scale
children in the Paris school system > Binet scale grouped items by age level , and
↳ first major of if questions successfully

measure intelligence minimum were not
answered , no credit was received

different → scores
>
types of questions were also
scattered throughout the test ↳ each subtest produces a raw score based
,
> in scale credit ( points) awarded the number of correct

a point ,
are on answers given
for each item ↳ raw score is converted to an index scale score :
* performance scale * mean __ 10
> nonverbal
examined
intelligence in a
way * Std . dev .
=3
not addressed by early Binet scales ↳ index scores are calculated by combining the index
> Wechsler test had verbal and

original scale scores on the subtests contained in that index
performance * mean = 100
> most recent versions have 5 Std dev 15

as many as * . .
=
scales → index scores
> it could reduce bias from and ↳

language verbal comprehension
culture * more subtle and informative than the original
→ WAIS -
II subtests verbal IQ score
* a measure of crystallized intelligence
↳
perceptual organization
* a measure of fluid intelligence
↳
working memory
* most important development of recent IQ tests
→ WAIS II indexes ↳
processing speed
-
* how quickly the mind works

vocabulary
verbal comprehension similarities
information → FSIQ
picture completion
block design
perceptual organization matrix reasoning
arithmetic
digit span
working memory
letter number
sequencing
✓ digit symbol -
coding
processing speed -
symbol search
→ of WAIS II ↳
interpretation -
applications
↳ Index * *
score comparisons psycho educational neuropsychological
-
then interpretation depends * selection / promotion disability

* it scores are similar
job *
upon the level of the scores * psychiatric diagnostic * treatment effectiveness
* if they are significantly different could reflect * research
lateralized deficit ,
learning disability ,
or
individual differences (SES)
↳
pattern analysis
* relatively large discrepancies between sub test
scaled scores can be evaluated and provide
information about different problems
* research into validity of pattern analysis has
been inconsistent and inconclusive
* until there is research

more confirmatory ,
this
should be done for hypothesis building only
↳
psychometric properties
* standardization
> 2200 adults in 13 and 13

age groups
specialty groups
* reliability
>
high reliability estimates for both internal
and temporal reliability
*
validity
> valid world for

considered the most in the
IQ
testing
↳ extensions
* WISC -
I ( 1949) -
ages 6- 16
* WPPSI -
☒ ( 1967) -
ages 2.5 -
Ty 7m
* WASI ~ contains 2 to 4 scales

testing in healthcare ~ week 9 → the biological basis of behaviour
cerebral cortex
→ what clinical
is
neuropsychology
↳ studies the relationship between behaviour pareto occipital

-
lobe
and brain
functioning in
cognitive ,
motor
,
thalamus
sensory , and emotional realms
↳ overlaps with Neurology and psychiatry
↳ involves the assessment and treatment of those
diagnosed with or suspected of having disorders ↳ frontal lobe
of the central nervous system * foresight * speech
↳ history * problem * primary motor cortex

solving
* Broca and Wernicke * inhibitory control * abstract thought
* WWI and WWII * concentration and attention
* significant growth in the 1970s and 1980s ↳ temporal lobe
* relatively recent advances in computerized * primary auditory cortex
assessment * hippocampus (memory)
↳ what is assessment ? * Wernicke 's ( speech

neurological area and comprehension)
* different from neuroimaging ; provides a * visual association area
unique set of information ↳ lobe

parietal
information
* provides in how well the brain *
primary sensory cortex
is
functioning in order to complete various * mathematical computations
cognitive and affective tasks * visual spatial
* what is it used for ? ↳ occipital
> >
diagnosis return to work * primary visual cortex
>
legal >
driving →
developmental neuropsychology
> treatment ↳ of children

recommendations testing can present unique challenges
>
post -
surgical change * child 's ability to adapt to new situations
> assessment of strength and weakness * brain development

* neuroplasticity
>
assessment of cognitive decline * behavioural issues
> assessment of
competency
↳ types of tests for children
neuropsychological →
neuropsychological assessment
* tests that assess general development and ↳ the neuropsychological interview
adaptive functions * medical hx * psychiatric hx
* tests that estimate attention and executive * developmental milestones * family hx
functions * hx
psychosocial
↳ > home > academic

neuropsychological testing can also be used hx
> work >

to identify learning disabilities , such as dyslexia legal
* children these issues entitled *

with are legally presenting complaint thx
them
to services to help overcome such *
personality
challenges * behavioural observations
→
neuropsychological deficits (D= absence of )
. . .
> lateralized
Signs > motor
↳ Acalculia ~
inability to perform arithmetic >
sensory
>
thought processes
>
calculations >
language memory
↳ Agnosia ~
deficit in
recognizing sensory
>
attention
>
executive functioning
stimuli > affect
faces ↳
*
prosopagnosia
- unable to distinguish neuropsychological testing
↳ Alexia ~
inability to read * fixed battery approach
↳ Amnesia of memory >

~ loss exact same test for every patient
>
↳ Aphasia ~ deficit in
language halstead -
reitan battery ( most common )
↳ Apraxia halstead finger

•
~
voluntary movement disorder in the category test • oscillation test
absence of paralysis
•
tactual performance test •
WAIS
↳ MMPI
conditions that neuropsychological rhythm test
• •
can cause
deficits •
speech sounds perception test
>
* head
injury * seizure disorder frequently added
•
* stroke / CVA * psychiatric disorders trail test hand dynamometer
making
•
* dementia * ADHD •
grooved pegboard
•
sensory perceptual exam
* HIV / Aids * substance abuse •

CULT -
II or II •
WMS -
II
* neuro development disorders * flexible battery approach
>
* other neurotoxins and more core set of tests that are common across
most ( or all) patients , but tests can be added
or subtracted as needed
> luria -
nebraska battery testing in healthcare 2 ~
week 10
functions rhythm → tests

neuropsychological
• •
motor
•
tactile • visual ↳ Intellectual
•
memory
•
reading * WAIS * Kaufman
•
receptive speech •
expressive speech * Stanford-Binet
•
intellectual processes • arithmetic skills ↳ achievement
→ domains of neuropsychological testing * Woodcock johnson * WIAT
↳ intellectual ↳ * wide test

sensory range achievement
↳ achievement ↳ perceptual ↳ motor
↳ motor ↳ visual
spatial * finger oscillation * grip strength
↳
language ↳ memory *
grooved pegboard
↳
learning ↳
↳ emotional
sensory
↳ personality * sensory test

perceptual
↳ attention and concentration

↳ visual spatial / visuo -
constructional
↳ executive *
judgement of line orientation
functioning
↳ performance * clock
validity drawing test
* test of * block
memory malingering design
15 task
* rey item test * hooper visual
organization
-
↳ symptom validity ↳
language
* assessment * vocabulary * token test
personality inventory
* structured inventory of malingered symptoms * multilingual aphasia exam * boston naming test
* controlled word association test
* PPVT (non -
verbal vocab test)
* WRAT reading spelling

,
↳
memory
* California verbal learning test -

III (CULT II) -
*
recognition memory test ( RMT)
* rey complex figure test CRCFT)
* sentence repetition test
* Wechsler
memory scale -
II ( Wms -
II )
↳
learning ↳ emotional
* California verbal learning test -

III (CULT III)
-
* beck depression inventory
* hopkins verbal
learning test -
( HVLT) * beck anxiety inventory
* *
bushke selective reminding test
geriatric depression scale single construct measures
* rey verbal learning test (HVLT ) * GAD -7 ( anxiety)
↳ attention and concentration * PHQ -9 (depression)
* digit vigilance test ( DVT) * child behaviour checklist
→ stress
* digit span and anxiety
* continuous performance test CCPT) ↳ stress -

a response to situations that involves
* paced auditory serial addition test ( PASAT) demands ,

constraints , or opportunities
↳ California verbal test (CVLT) * a experience but for

learning common some it advances
* examines how errors are made rather to pathological , disruptive levels
than just totalling right and wrong answers * estimated to be involved in 50-1 .
to 80.1 .
of illnesses
* examines of variables ↳ three components of stress

a
range
* many forms of analysis are conducted * frustration * pressure
* has been used to compare patients with * conflict
Alzheimer's ↳ anxiety
or
Huntington's ,
as well as is an emotional state marked by
Korsakoff 's syndrome worry , apprehension , and tension
* has been released in children's version → state trait

a -
anxiety inventory
* psychometrics ↳ state from situation

are
generally high , and anxiety varies to
'
correlates with other tests of same areas situation -

trait anxiety is a
personality
↳ executive functioning characteristic
* category test ( problem solving test) ↳ the STAI produces separate scores for each
* Stroop test ↳ 20 items for each type of anxiety ,

4 pt likert
↳
* Wisconsin card sorting test promising psychometrics
* COW AT ↳ correlates well with other measures of anxiety
* trails B →
ecological momentary assessment
↳ personality ↳ valuable to measure a

single construct at
* MMPI -2 different times , on an

ongoing basis
*
personality assessment inventory
↳ ecological momentary assessment ( EMA) can computers and basic psychological science in
measure physical qualities at diff . times

testing ~ week 11
* also for information about → behavioural

calls
recording cognitive -
assessment procedures us .
Moods , State anxiety , fatigue ,

etc . the medical model of assessment
* occurs in nature environment and leads to ↳ the rationale for cognitive - behavioural assessment
a lot of data * traditional assessment based on the medical model
↳ different technologies may improve this >

disordered behaviour is a symptom of an
underlying
approach cause
→ NIH toolbox * cognitive -

behavioural assessment views behaviours,
↳ non proprietary , publicly available thoughts , and physiological processes as the problem
↳ four domains >

does not deny the importance of psychological disorders
* cognition * motor '

often evaluates the importance of both internal
* emotion * sensation and external factors
↳ 3 to 85 of
years age * more direct than traditional
psychological tests
* traditional us .
Cognitive - behavioural assessment
traditional cognitive -
behavioural
target underlying cause disordered behaviour
focus of treatment
symptoms superficial
indirect ; not related direct ; related to
assessment to treatment treatment
theory medical model behavioural model

→ of assessment
quality life
determine cause analyze disordered
goal symptoms behaviour

↳ two common themes of
is ↳
* premature mortality not desirable early procedures based on operant conditioning
* quality of life is important * consequences of behaviours are thought to affect
→ health -
related quality of life those behaviours in the future
↳ WHO : "
health is a complete state of * multiple steps :
>
physical mental , social
,
and well -
being identify the critical behaviour of interest ( deficits or excesses )
"
and not merely the absence of disease >
employ interventions to increase or decrease the behaviour
↳ two
major approaches as needed
* psychometric
> determine what change in behaviour has occurred and
* decision
theory adjust intervention
* used for a variety of problems -
examples include * assertiveness
>
habits , addiction , and diet ability appropriately stand for
smoking , study poor the to or speak up
* steps in a cognitive behavioural

-
assessment one 's self in a difficult situation
>
② identify critical behaviours distinguished from aggressiveness ( e. g. , temper tantrum)
② determine > of have

whether critical behaviours are excesses many measures assertiveness been developed
or deficits such as the Assertive Behaviour Survey schedule
③ evaluate critical behaviours for frequency duration , or

•
requires to would behave
, you imagine how you in a
intensity (i. e. , obtain a baseline situation that would typically call for as assertive response
④ if excesses , attempt to decrease frequency ,

duration , * evaluation of self report
-
procedures
>
or intensity of behaviours ; if deficits , attempt to psychometric data on self -
report instrument have
increase behaviours traditionally been lacking
↳ self >
of
-
report techniques early pencil -
and -
paper tests are very reminiscent
* observation of actual problematic behaviours is not modern self report

-
procedures
>
always possible typically have problems with face validity and all of its
* self report -
techniques involve
considering a list of various complications
>
statements about a given situation -
often true / false very few of them have been subjected to adequate ,
* traditional methods focus on

enduring internal well -
designed research
characteristics , while cognitive -

behavioural approach ↳ the dysfunctional altitude scale
focuses on the importance of situations * beck 's cognitive model of psychopathology
>
traditional method : a
person is always fear ful * schemas -
cognitive frameworks that guide our
>
cognitive -
behavioural method : a
person is fear ful knowledge ,
beliefs , and actions
>
only in certain circumstances negative ( dysfunctional ) schemas underlie
pathological
* the fear survey schedule CFSS) behaviours
> DAS with its

oldest and most researched cognitive -
behavioural * the , parallel forms , assesses the
self-report procedure presence and extent of such

negative schemas
> different versions * 7-

have between 50 and 122 items , point Likert scale is used ,
and research
with a 5- or 7- point Likert scale supports the validity of the DAS
>
different versions for different and
age groups
cross -
cultural studies have been done
> attempts to identify situations or stimuli that elicit
fear
responses
↳ irrational beliefs test →
psychophysiological procedures
* in a same manner , the irrational or unrealistic ↳ physiological variables with treatment implications
beliefs we hold can drastically affect our * is psychophysiological assessment feasible -

yes
emotions and actions * examines involuntary responses to stimuli
* the Irrational Belief Test ( Jones 1968) , was * some examples of such measurement
designed to measure these cognitions using a >

polygraph to measure various functions
100 -
item scale • blood pressure • heart rate
*
agree or
disagree with statements
using a •
galvanic skin response ( GSR)
5- point Likert scale > tests to assess sexual responses
↳ evaluation of
* adequate psychometrics ,
and was initially psychophysiological testing
used quite heavily in clinical

settings * some studies find a relationship between
↳ Irrational Beliefs Inventory (IBI) physiological responses and certain cognition
* of the IBT failure > dilation

a weakness was the to
intelligence vs .
pupillary
beliefs from >

separate negative emotions pulse and skin conductance variability us .
1131 that issue processing intensity

* the addressed , attempting to
assess thoughts independent of * can devices separate true changes

only negative assessment
emotions from artifacts
5 * what
* uses a 5- point scale and contains subscales changes are considered significant?
* consistent 5 * factors considered ?

psychometrics , the subscales are are
demographic
found to be → computers and

independent of eachother , and has
psychological testing
↳ two
a variety of uses in clinical settings basic ways to use computers in
testing
↳
cognitive functional analysis * administer, score , and interpret tests
* the basis of cognitive functional analysis is * create new tasks to test abilities that traditional
the notion that what people say to themselves procedures cannot
plays roll behave ↳ first " Eliza ,

a
large in how they computer assessment tool " was an
* internal is critical part of assessment Al

dialogue a this like creation that interviewed clients and
-
* components gave empathetic responses
> environmental
antecedents internal ↳ its did not it
>
dialogue creator expect to work
>
environmental consequences
* the role of self devices / tools

-
monitoring
↳ computer-assisted interview ↳ tests only possible by computer
* computers be used to individual data * virtual reality

can
gather
> treatment of
in a manner equal or more valid than a pencil phobias
and >
paper response exposure -
based interventions
* if > of skills treatment

the computer version is
approximately the certain types social
↳
same as a written version ,
what accounts for computer adaptive testing
increased accuracy ? * tests that adjust to the answers provided
>
standardized questions * time and cost reduction benefits
> lack of social desirability effect * not suitable for types of situations
all testing
> for
reduces embarrassment delicate topics
↳ tests
computer -
administered
* test administered by computers are increasing ,
and results suggest that many produce similar
evaluations as pencil -
paper formats
* some do produce different results
> > which

why ? is more accurate ?
* benefits
> less time -
consuming
> cost -
effective
> often more accurate > better

accepted
↳ and of results
computer diagnosis , scoring , reporting
* the use of computers is highly debated
* interpretation by computers appear to be similar
to that of humans
* the same may be true for projective tests
* divided ; no true of humans

research is
replacing
↳ internet
usage for
psychological testing
for
* internet is a
thriving source psychological
test but for entertainment

test bias * differential ( DIF)
~ week 11 item
functioning analysis
→
why is test bias controversial ? > educational testing service (ETS)
↳ all created equal ,

people may be but they are
not all treated that way
↳ when test scores show differences , are those
differences real , or an artifact of bias in the test
↳ studies found the differences

have consistently same
but why do they occur
* environmental factors
* biological differences and the g factor
↳ even when such information about test takers is
not reported , these do not seem to close

gaps
→ the traditional of
defense testing
↳ differential validity
* if a test is valid in different ways for
different groups ,
is it really valuable ?
* what factors impact the way these differences
are considered
↳ content -
related evidence for validity
* items on
intelligence tests may unfairly favour
or hinder certain
groups
* Flaugher ( 1978) -
problems that appear to indicate
test bias stem from misunderstanding about how
tests are interpreted
* individual items are less important than general
trends in scores
* a response to this is to point to specific items that
are more familiar to certain

groups , indicating bias
"
* (2016) found "
Drasgrow et al . that purifying tests
to eliminate items that bias

would indicate did
not reduce differences between

groups

PSYC 3200 - Tests & Measurement

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PSYC 3200 - Tests & Measurement

Uploaded by

Copyright:

Available Formats

Tests & Measurement

tests and measurement * army alpha test -

→ historical overview * beta test illiterate

↳ ancient ( 2200 BCE) ↳

* civil service exams * structured (self -

* Darwin * Cattell ~ mental test

↳ James McKeen Cattell

* studied under Wundt

* brought their ideas +

* commissioned French to develop

a test to determine if a child would

benefit from a standard classroom

* the first 1905

* equal intervals , no absolute zero

* equal intervals with an absolute zero

one occurs in a data set

( observed score - mean )

* mean standard deviation units

* mode ~ most common score * mean = 50

amount of from the →

pointed (t ) , low observation 1/63=-016

↳ similar to percentile rank •

* is a measure of relative performance •

↳ divide the distribution into equal Bi serial

* inter is the interval of orificial

represents 50.1 of distribution °

* deciles divide the distribution into 10 equal scale forced into a

↳ obtained from sample of

can be compared to assess examinees * prediction

↳ somewhat controversial decreases the likelihood of

association between two variables reflects the amount of variance in the

* scatter plot for

* significance testing represents the amount of variance not

coefficients * two measurements should lead to approx .

↳ cross validation the same results

* independent * with themselves

* can a set of measures be reduced to * estimate errors in measurement

a smaller set ! * improve tests to minimize errors

interpreted? * caused by factors that randomly influence

* role in measurement the measurement of a variable across the

* increase decrease scores

* not " "

↳ error due to the instrument

↳ correlation coefficient * divided into two

↳ (rxx) * split half

* index of measurement consistency * Cronbach 's Alpha / KR -20

* index of the relative influence of true score • a measure of inter item

* 2 different versions of the test

* test construction * assesses the

* limitations estimates of the

• time * estimate two ways

contain the same number and type of items raters

reliability (rtimesstimez ) categorical (yes/ no)

* administer same measure to same sample the % agreement between raters

at two different times •

* correlate first set of scores with second as a

* limitations following correction for chance

pilot test measures of test scores obtained by one person

* test administration equivalent tests

* test interpretation specific level of confidence

ensure data entered