Professional Documents
Culture Documents
Interpretation
Interpretation
Interpretation
Psychometrics
ADifferentFormat
Previoustalksweregenerallyaboutonetopic
Todayspresentation:Wheredoesthisstuffcome
upatMP,outsideofthepsychos?
Alittlebitofinfoonseveraldifferentthings
Thegoals
Understandvariouspsychometricanalyses
astheyariseindaytodaywork
Seewhichstatsareusedindifferent
applications
Answerquestions
TopicsCovered
Thingsyoudfindinakeyverificationfile
Classicalstats(pvalues,pointbiserials)
Thingsyoudfindataformpulling
IRTstats(TCCs,TIFs)
Thingsyoudfindinatechnicalmanual
Allsortsofinfo
Aquestionyoudhearatastandardsetting
IRT
1.KeyVerificationFiles
Purpose:Tocheckthecorrectnessof
answerkeys(MCitems)
Alistofitemswhosestatsareunusualor
meritfurtherinvestigation
Itemsidentifiedbasedontheirpvalues
and/orpointbiserials
Pvalue:Theproportionofstudents
answeringanitemcorrectly
Howeasyistheitem?
Pointbiserial:Thecorrelationbetween
itemscoreandtotalscore
Ifyoudowellontheitem,doyoutendtodo
wellonthetest?
Whenmightwebealarmed?
Notmanykidsarepickingtherightanswer
Thepvalueislow(lessthan.25)
Lowperformingkidsaredoingbetteronthe
itemthanhighperformingkids
Thepointbiserialislow(lessthan.15)
and/or
Ifanincorrectanswerchoicehasstrange
stats
DistractorStats
Distractorpvalue:Theproportionofstudents
pickingthedistractor(say,choiceCwhenthe
correctanswerisB)
HowpopularischoiceC?
Flagitemifdistractorpvalueishigherthan.3
Distractorpointbiserial:Thecorrelationbetween
pickingthedistractorandtotaltestscore
IfyoupickedC,howwelldidyoutendtodoonthetest?
FlagitemifdistractorPBSispositive
AnOperationalExample
Arecentitemhadthefollowingstats:
Key=D
Pvalue=0.10
Pointbiserial=0.02
PvalueforC=0.60
PointbiserialforC=0.20
Sothekeywaswrong?Nope
HowCanThatHappen?
Anexample:Whatisthedefinitionofthewordtravesty?
A:Mockery
B:Injustice
C:Bellybutton
D:Someevenstupideranswerthanbellybutton
Actualdefinition:Anygrotesqueordebasedlikenessor
imitation
ThecorrectanswerisA,buttravestyofjusticethrew
offthehighperformingstudents
Tosumup
Psychometricscanhelpusidentifyitemswhosekeysneed
tobechecked
Statsused:
Pvalues
Pointbiserials
Distractorpvaluesandpointbiserials
Pvalues&pointbiserialsshouldberelativelyhigh,
distractorvaluesshouldberelativelylow
Thekeyusuallyturnsouttoberight,butthatsOK
2.FormPulling
Context:Wearechoosingitemsfornextyears
exam
Clientsliketolookatpsychometricinfowhen
pickingitems(e.g.,MCAS)
Weknowthestatsaheadoftimebecauseitems
werefieldtested
Relevantstats:TestCharacteristicCurves
(TCCs),rawscorecutpoints,TestInformation
Functions(TIFs)
ThisstuffrelatestoItemResponseTheory(IRT)
TCCisaplotthattellsyoutheexpectedrawscore
foreachvalueofability(denotedtheta)
Asabilityincreases,expectedrawscoreincreases
ExampleofaTCC:5Items
RawScoreCutPoints
Supposetesthas4performancelevels:Below
Basic,Basic,Proficient,Advanced
Howmanypointsdoyouneedinordertoreach
theBasiclevel?Proficient?Advanced?
Example:Testgoesfrom0to72.Need35to
reachBasic;51toreachProficient;63toreach
Advanced
StandardSettingoftentellsusthetacutpoints;
clientswanttoknowrawscorecuts
UsingtheTCCtofindacutpoint
Supposethetacutis0.4
Findexpectedrawscoreat0.4usingtheTCC.Itis3.3
Cutisplacedbetween3and4
TestInformationFunctions
TIFstellusthetestprecisionateachlevel
ofability
Thehigherthecurve,themoreprecision
Easyitemsgiveusprecisionforlowvalues
oftheta.Similarly:
Harditemsgiveprecisionathighvalues
Mediumitemsgiveprecisionatmediumvalues
ExampleofaTIF
Whydoestheclientcare?
Itisoftendesiredthatnextyearsformsare
similartothisyearsforms
Makesuretestsarecorrectdifficulty(TCC,
RScutpoints)&precision(TIF)
MatchTCCs,cutpoints,TIFsofthetwo
years
Whyshouldtheformsbesimilar?
Theoretically,weshouldbeabletoaccount
fordifferencesthroughequating(Liz)
However,wantthestudentexperiencetobe
similarfromyeartoyear
DontwanttogiveeasytesttoClassof07,
hardtesttoClassof08
Dontwanttomakethisyearstestless
precisethanlastyears
Example:2007MCAS,Grade10Math
Proposed2007TCCwaslowerthanlastyears
Solution:Replacesomeharditemswitheasyitems
Example,Continued
Proposed2007TIFhadlessinfoatlowabilities,moreinfoat
highabilities
Solution:
Replacesomeharditemswitheasyitems
UseharditemswithlowerPBS,easyitemswithhigherPBS
Example,Continued
RAW SCORE CUTS
OLD
PROPOSED
20
15
33
28
45
41
Proposed2007rawscorecutslowerthan2006rawscorecuts
Solution:Replacesomeharditemswitheasyitems
Guidetomakingchanges
Somerulesofthumbfordifferentproblems:
TCC
TIF
Cuts
Too low
Too high
Tosumup
ItemResponseTheoryisusefulinformpulling
TCCs,rawscorecuts,TIFsareoftenexamined
Proposedvaluesshouldbesimilartocurrentyears
Testsshouldntbetooeasyorhard
Testsshouldbeinformativebutnottooinformative
Itshelpfultoknowhowwecanchangethese
thingsbasedonitemstats
3.TechnicalManuals
ThingsinTechnicalManualsvaryfromprogram
toprogram
Oftenseesomeofthefollowing:
Pvaluesandpointbiserials(thanksLouis!)
Testreliabilities(thanksLouis!)
TCCsandTIFs(thanksMike!)
DIF(thanksWon!)
StandardSetting(thanksLizandAbdullah!)
Equating(thanksinadvanceLiz!)
Interraterreliability(thanksfornothing!)
Decisionconsistencyandaccuracy(ditto)
TechnicalManuals:
PValues&PointBiserials
Youlloftenseeatablelikethis:
Grade
Subject
Stat
ALL
MC
OR
MAT
Dif
0.67 ( 0.15)
0.7 ( 0.13)
0.61 ( 0.16)
MAT
Disc
0.44 ( 0.08)
0.43 ( 0.07)
0.47 ( 0.1)
MAT
142
89
53
REA
Dif
0.67 ( 0.15)
0.71 ( 0.13)
0.52 ( 0.11)
REA
Disc
0.48 ( 0.1)
0.45 ( 0.09)
0.6 ( 0.05)
REA
85
70
15
TechnicalManuals:
Reliabilities(andotherstats)
Louissaid:Reliabilityisthecorrelationbetweenscoresonparallel
forms.HigherreliabilityGreaterconsistency
Youlloftenseeatablelikethis:
Grade
Subject
Points
Min
Max
Mean
S.D.
Rel. ()
MAT
32219
65
65
40.341
13.693
0.934
REA
32087
52
52
31.446
10.869
0.895
MAT
32673
65
65
39.628
13.043
0.925
REA
32527
52
52
33.452
9.112
0.891
MAT
33532
66
66
31.546
13.68
0.917
REA
33402
52
52
29.153
8.64
0.876
TechnicalManuals:
TCCsandTIFs
GiveTCC,TIFofeachgrade/contentarea
TechnicalManuals:DIF
Wonsaid:AnitemhasDIFiftheprobabilityofgettingthe
itemrightisdependentongroupmembership(e.g.,gender,
ethnicgroup)
MeasuredProgressusesamethodcalledtheStandardized
PDifference
Comparinggroups
MaleFemale
WhiteBlack
WhiteHispanic
Minimum200examineesineachgroup
DIF,Continued
A:[0.05~0.05]
negligible
B:[0.1~0.05)and(0.05~0.1] low
C:outsidethe[0.1~0.1]
high
C
DIF,Continued
Youmayseeatablelikethis:
TechnicalManuals:
StandardSetting&Equating
LizandAbdullahdiscussedStandardSetting
Intechnicalmanuals,youlloftensee:
Report/summaryofstandardsettingprocess
Infoaboutpanelists(howmany,whotheyare)
Whatmethodwasused(e.g.,bookmark/BodyofWork)
Cutpoints
Infoaboutpanelistevaluations
Equating:Comenextweekandfindout!
Interraterreliability
Whenconstructedresponseitemsareratedbymultiple
scorers,howwelldoratersagree?
Themoreagreement,thebetter
Exactagreement:What%ofthetimedotheygivethesame
score?
Adjacentagreement:What%ofthetimearetheyoffby1?
ReadingOpenResponse
Agreement
Exact
Adjacent
>1
Percentage
69.3
27.4
3.3
DecisionAccuracyandConsistency:
Introduction
Formostprograms,fourachievementlevels,e.g.,
BelowBasic,Basic,Proficient,Advanced
Decisionaccuracy:degreetowhichobserved
categorizationsmatchtruecategorizations
Decisionconsistency:degreetowhichobserved
categorizationsmatchthoseofaparallelform
Intuitiveexamplesofaccuracy
TRUELEVEL:Proficient
OBSERVEDLEVEL:Proficient
DIAGNOSIS:ACCURATE(GOOD)
TRUELEVEL:Proficient
OBSERVEDLEVEL:BelowBasic
DIAGNOSIS:INACCURATE(BAD).Falsenegative
TRUELEVEL:Basic
OBSERVEDLEVEL:Advanced
DIAGNOSIS:INACCURATE(BAD).Falsepositive
Intuitiveexamplesofconsistency
OBSERVEDLEVEL,Form1:Basic
OBSERVEDLEVEL,Form2:Basic
DIAGNOSIS:CONSISTENT(GOOD)
OBSERVEDLEVEL,Form1:Basic
OBSERVEDLEVEL,Form2:Advanced
DIAGNOSIS:INCONSISTENT(BAD)
DecisionAccuracyandConsistency:
Introduction
LivingstonandLewis(1995)proposedmethodof
estimatingdecisionaccuracy/consistency
Formostprograms,manystatsarecomputed.We
willgiveanexampleofeach
Thestatsareallbasedonjointdistributions
Ajointdistributiongivestheproportionoftimes
that2thingsbothhappen.
WhatproportionofstudentsaretrulyBasicandare
observedasBelowBasic?
JointDistribution:True/Observed
AchievementLevels
True Status
ObservedStatus
BB
Total
BB
0.0706
0.0176
0.0007
0.0000
0.0889
0.0320
0.1058
0.0436
0.0000
0.1814
0.0014
0.0532
0.4726
0.0734
0.6007
0.0000
0.0000
0.0296
0.0993
0.1290
Total
0.1041
0.1766
0.5466
0.1728
1.0000
Overallaccuracy:0.7484
JointDistribution:Observed/Observed
AchievementLevels
Observed Status:
Form 1
ObservedStatus:Form2
BB
Total
BB
0.0673
0.0310
0.0058
0.0000
0.1041
0.0310
0.0820
0.0632
0.0003
0.1766
0.0058
0.0632
0.4066
0.0709
0.5466
0.0000
0.0003
0.0709
0.1015
0.1728
Total
0.1041
0.1766
0.5466
0.1728
1.0000
Overallconsistency:0.6574
IndicesConditionaluponLevel
Proportionofstudentscorrectlyclassified,given
truelevel
Proportionofstudentsconsistentlyclassifiedby
parallelform,givenobservedlevel
Accuracy
Consistency
BB
0.7945
0.6466
0.5831
0.4645
0.7868
0.7439
0.7702
0.5876
IndicesatCutPoints
Accuracy&consistencyatspecifiedcutpoint
Accuracy:Whatisthechancethatastudentisclassified
onthecorrectsideofacutpoint?
Consistency:Whatisthechancethatastudentis
classifiedonthesamesideofacutpointtwice?
Accuracy
FalsePositive FalseNegative
Consistency
BB:B
0.9483
0.0183
0.0334
0.9264
B:P
0.9011
0.0443
0.0546
0.8612
P:A
0.8969
0.0734
0.0296
0.8575
Tosumup
Lotsofstuffintechnicalmanuals
Bothclassicaltesttheorymaterial(pvalues,
pointbiserials,reliabilities)&IRTmaterial
(TCCs,TIFs,equating)areimportantto
understand
Hopefully,theseseminarshavehelpedfamiliarize
youwiththeircontents
4.StandardSetting
ComesupallthetimeoutsidePsychoville
Shouldbeaperfecttopicforthistalk,but
LizandAbdullahalreadydidawonderfuljob
4.StandardSetting
StandardSettingistheprocessofrecommending
cutscoresbetweenachievementlevels
Advance(A)
Proficient(P)
BelowProficient(BP)
Failing(F)
Cutpoint3
Cutpoint2
Cutpoint1
FocusononeFAQinbookmark:
Howdowedeterminethearrangementofitemsinthe
ordereditembooklets?
BriefReviewofBookmark
Eachpanelistmakesuseoftheordereditembooklet
ItemsintheOIBarepresentedfromeasiesttohardest.
OnepageperMCitem
PanelistsjobistoplacebookmarkinOIBforeachcut
Foragivencut,wheredopanelistsplaceabookmark?
Wheretheythinkborderlinestudentswouldnolongerhavea2/3
chance(orbetter)ofacorrectanswer
Abdullahsaid:cutpointsarederivedfrombookmark
placements
AVeryFrequentlyAskedQuestion
First,aFMC:Youmesseduptheorderof
theitems!
Then,theFAQ:Well,howdidyou
determinetheorder?
Important:Orderisbasedonactualstudent
performance
WeusetheconceptofIRT
TwoMCitems:
Whichiseasier?
Easieritem
Harderitem
DependingonIRTModelthisissue
canbecomequitecomplex
AnIntuitiveExplanation
Aneasyitem:Anitemthatevenlowabilitystudentsget
rightahighproportionofthetime
Thatis,studentswithsmallthetavaluestendtogetitright
Whichitemhasthesmallestthetavaluecorrespondingtoa
highprobabilityofacorrectanswer?
Howhighaprobability?Use2/3forconsistency
INSUM:Easiestitemistheonewiththe
smallestthetacorrespondingtop=2/3
Hardesthaslargestthetacorrespondingtop=2/3
Usethe2/3Criterion
Easiesttohardest:Orange,green,red,purple,blue
Thetas:0.60.20.30.81.2
Howaboutpolytomousitems?
Apolytomousitemisonethathasmorethan2
possiblescores
MCitemsaredichotmous(0/1),notpolytomous
Exampleofpolytomous:ORitemscored0,1,2,3,4
SuchanORitemisintheOIBfourtimes,oncefor
eachscorepoint1,2,3,4
Wheredoyouputthisitems4pagesintheOIB?
Incorporatingpolytomousitems
Justaswithdichotmousitems,weuseIRT
Whatthetadoyouneedtohavea2/3chanceofgettinga1
orbetter?2orbetter?3orbetter?4?
Thethetamustincreaseasthescoreincreases
Supposetheresultsare:0.4,0.4,0.6,1.8
Easiesttohardest:Orange,green,red,purple,blue
Thetas:0.60.20.30.81.2
1