Interpretation

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 54

Interpretation:HowtoUse

Psychometrics

ADifferentFormat
Previoustalksweregenerallyaboutonetopic
Todayspresentation:Wheredoesthisstuffcome
upatMP,outsideofthepsychos?
Alittlebitofinfoonseveraldifferentthings

Thegoals
Understandvariouspsychometricanalyses
astheyariseindaytodaywork
Seewhichstatsareusedindifferent
applications
Answerquestions

TopicsCovered
Thingsyoudfindinakeyverificationfile
Classicalstats(pvalues,pointbiserials)

Thingsyoudfindataformpulling
IRTstats(TCCs,TIFs)

Thingsyoudfindinatechnicalmanual
Allsortsofinfo

Aquestionyoudhearatastandardsetting
IRT

1.KeyVerificationFiles
Purpose:Tocheckthecorrectnessof
answerkeys(MCitems)
Alistofitemswhosestatsareunusualor
meritfurtherinvestigation
Itemsidentifiedbasedontheirpvalues
and/orpointbiserials

Pvalue:Theproportionofstudents
answeringanitemcorrectly
Howeasyistheitem?

Pointbiserial:Thecorrelationbetween
itemscoreandtotalscore
Ifyoudowellontheitem,doyoutendtodo
wellonthetest?

Whenmightwebealarmed?
Notmanykidsarepickingtherightanswer
Thepvalueislow(lessthan.25)

Lowperformingkidsaredoingbetteronthe
itemthanhighperformingkids
Thepointbiserialislow(lessthan.15)

and/or

Ifanincorrectanswerchoicehasstrange
stats

DistractorStats
Distractorpvalue:Theproportionofstudents
pickingthedistractor(say,choiceCwhenthe
correctanswerisB)
HowpopularischoiceC?
Flagitemifdistractorpvalueishigherthan.3

Distractorpointbiserial:Thecorrelationbetween
pickingthedistractorandtotaltestscore
IfyoupickedC,howwelldidyoutendtodoonthetest?
FlagitemifdistractorPBSispositive

AnOperationalExample
Arecentitemhadthefollowingstats:

Key=D
Pvalue=0.10
Pointbiserial=0.02
PvalueforC=0.60
PointbiserialforC=0.20

Sothekeywaswrong?Nope

HowCanThatHappen?
Anexample:Whatisthedefinitionofthewordtravesty?
A:Mockery
B:Injustice
C:Bellybutton
D:Someevenstupideranswerthanbellybutton
Actualdefinition:Anygrotesqueordebasedlikenessor
imitation
ThecorrectanswerisA,buttravestyofjusticethrew
offthehighperformingstudents

Tosumup
Psychometricscanhelpusidentifyitemswhosekeysneed
tobechecked
Statsused:
Pvalues
Pointbiserials
Distractorpvaluesandpointbiserials

Pvalues&pointbiserialsshouldberelativelyhigh,
distractorvaluesshouldberelativelylow
Thekeyusuallyturnsouttoberight,butthatsOK

2.FormPulling
Context:Wearechoosingitemsfornextyears
exam
Clientsliketolookatpsychometricinfowhen
pickingitems(e.g.,MCAS)
Weknowthestatsaheadoftimebecauseitems
werefieldtested
Relevantstats:TestCharacteristicCurves
(TCCs),rawscorecutpoints,TestInformation
Functions(TIFs)

ThisstuffrelatestoItemResponseTheory(IRT)
TCCisaplotthattellsyoutheexpectedrawscore
foreachvalueofability(denotedtheta)
Asabilityincreases,expectedrawscoreincreases

ExampleofaTCC:5Items

RawScoreCutPoints
Supposetesthas4performancelevels:Below
Basic,Basic,Proficient,Advanced
Howmanypointsdoyouneedinordertoreach
theBasiclevel?Proficient?Advanced?
Example:Testgoesfrom0to72.Need35to
reachBasic;51toreachProficient;63toreach
Advanced
StandardSettingoftentellsusthetacutpoints;
clientswanttoknowrawscorecuts

UsingtheTCCtofindacutpoint

Supposethetacutis0.4
Findexpectedrawscoreat0.4usingtheTCC.Itis3.3
Cutisplacedbetween3and4

TestInformationFunctions
TIFstellusthetestprecisionateachlevel
ofability
Thehigherthecurve,themoreprecision
Easyitemsgiveusprecisionforlowvalues
oftheta.Similarly:
Harditemsgiveprecisionathighvalues
Mediumitemsgiveprecisionatmediumvalues

ExampleofaTIF

Whydoestheclientcare?
Itisoftendesiredthatnextyearsformsare
similartothisyearsforms
Makesuretestsarecorrectdifficulty(TCC,
RScutpoints)&precision(TIF)
MatchTCCs,cutpoints,TIFsofthetwo
years

Whyshouldtheformsbesimilar?
Theoretically,weshouldbeabletoaccount
fordifferencesthroughequating(Liz)
However,wantthestudentexperiencetobe
similarfromyeartoyear
DontwanttogiveeasytesttoClassof07,
hardtesttoClassof08
Dontwanttomakethisyearstestless
precisethanlastyears

Example:2007MCAS,Grade10Math

Proposed2007TCCwaslowerthanlastyears
Solution:Replacesomeharditemswitheasyitems

Example,Continued

Proposed2007TIFhadlessinfoatlowabilities,moreinfoat
highabilities
Solution:
Replacesomeharditemswitheasyitems
UseharditemswithlowerPBS,easyitemswithhigherPBS

Example,Continued
RAW SCORE CUTS
OLD

PROPOSED

20

15

33

28

45

41

Proposed2007rawscorecutslowerthan2006rawscorecuts
Solution:Replacesomeharditemswitheasyitems

Guidetomakingchanges
Somerulesofthumbfordifferentproblems:

TCC

TIF

Cuts

Too low

Add easier items

Add high PBS

Add easier items

Too high

Add harder items

Add low PBS

Add harder items

Tosumup
ItemResponseTheoryisusefulinformpulling
TCCs,rawscorecuts,TIFsareoftenexamined
Proposedvaluesshouldbesimilartocurrentyears
Testsshouldntbetooeasyorhard
Testsshouldbeinformativebutnottooinformative

Itshelpfultoknowhowwecanchangethese
thingsbasedonitemstats

3.TechnicalManuals
ThingsinTechnicalManualsvaryfromprogram
toprogram
Oftenseesomeofthefollowing:

Pvaluesandpointbiserials(thanksLouis!)
Testreliabilities(thanksLouis!)
TCCsandTIFs(thanksMike!)
DIF(thanksWon!)
StandardSetting(thanksLizandAbdullah!)
Equating(thanksinadvanceLiz!)
Interraterreliability(thanksfornothing!)
Decisionconsistencyandaccuracy(ditto)

TechnicalManuals:
PValues&PointBiserials
Youlloftenseeatablelikethis:
Grade

Subject

Stat

ALL

MC

OR

MAT

Dif

0.67 ( 0.15)

0.7 ( 0.13)

0.61 ( 0.16)

MAT

Disc

0.44 ( 0.08)

0.43 ( 0.07)

0.47 ( 0.1)

MAT

142

89

53

REA

Dif

0.67 ( 0.15)

0.71 ( 0.13)

0.52 ( 0.11)

REA

Disc

0.48 ( 0.1)

0.45 ( 0.09)

0.6 ( 0.05)

REA

85

70

15

TechnicalManuals:
Reliabilities(andotherstats)

Louissaid:Reliabilityisthecorrelationbetweenscoresonparallel
forms.HigherreliabilityGreaterconsistency
Youlloftenseeatablelikethis:

Grade

Subject

Points

Min

Max

Mean

S.D.

Rel. ()

MAT

32219

65

65

40.341

13.693

0.934

REA

32087

52

52

31.446

10.869

0.895

MAT

32673

65

65

39.628

13.043

0.925

REA

32527

52

52

33.452

9.112

0.891

MAT

33532

66

66

31.546

13.68

0.917

REA

33402

52

52

29.153

8.64

0.876

TechnicalManuals:
TCCsandTIFs
GiveTCC,TIFofeachgrade/contentarea

TechnicalManuals:DIF
Wonsaid:AnitemhasDIFiftheprobabilityofgettingthe
itemrightisdependentongroupmembership(e.g.,gender,
ethnicgroup)
MeasuredProgressusesamethodcalledtheStandardized
PDifference
Comparinggroups
MaleFemale
WhiteBlack
WhiteHispanic
Minimum200examineesineachgroup

DIF,Continued
A:[0.05~0.05]
negligible
B:[0.1~0.05)and(0.05~0.1] low
C:outsidethe[0.1~0.1]
high
C

DIF,Continued
Youmayseeatablelikethis:

TechnicalManuals:
StandardSetting&Equating
LizandAbdullahdiscussedStandardSetting
Intechnicalmanuals,youlloftensee:
Report/summaryofstandardsettingprocess

Infoaboutpanelists(howmany,whotheyare)
Whatmethodwasused(e.g.,bookmark/BodyofWork)
Cutpoints
Infoaboutpanelistevaluations

Equating:Comenextweekandfindout!

Interraterreliability
Whenconstructedresponseitemsareratedbymultiple
scorers,howwelldoratersagree?
Themoreagreement,thebetter
Exactagreement:What%ofthetimedotheygivethesame
score?
Adjacentagreement:What%ofthetimearetheyoffby1?
ReadingOpenResponse

Agreement

Exact

Adjacent

>1

Percentage

69.3

27.4

3.3

DecisionAccuracyandConsistency:
Introduction
Formostprograms,fourachievementlevels,e.g.,
BelowBasic,Basic,Proficient,Advanced
Decisionaccuracy:degreetowhichobserved
categorizationsmatchtruecategorizations
Decisionconsistency:degreetowhichobserved
categorizationsmatchthoseofaparallelform

Intuitiveexamplesofaccuracy
TRUELEVEL:Proficient
OBSERVEDLEVEL:Proficient
DIAGNOSIS:ACCURATE(GOOD)
TRUELEVEL:Proficient
OBSERVEDLEVEL:BelowBasic
DIAGNOSIS:INACCURATE(BAD).Falsenegative
TRUELEVEL:Basic
OBSERVEDLEVEL:Advanced
DIAGNOSIS:INACCURATE(BAD).Falsepositive

Intuitiveexamplesofconsistency
OBSERVEDLEVEL,Form1:Basic
OBSERVEDLEVEL,Form2:Basic
DIAGNOSIS:CONSISTENT(GOOD)
OBSERVEDLEVEL,Form1:Basic
OBSERVEDLEVEL,Form2:Advanced
DIAGNOSIS:INCONSISTENT(BAD)

DecisionAccuracyandConsistency:
Introduction
LivingstonandLewis(1995)proposedmethodof
estimatingdecisionaccuracy/consistency
Formostprograms,manystatsarecomputed.We
willgiveanexampleofeach
Thestatsareallbasedonjointdistributions
Ajointdistributiongivestheproportionoftimes
that2thingsbothhappen.
WhatproportionofstudentsaretrulyBasicandare
observedasBelowBasic?

JointDistribution:True/Observed
AchievementLevels

True Status

ObservedStatus
BB

Total

BB

0.0706

0.0176

0.0007

0.0000

0.0889

0.0320

0.1058

0.0436

0.0000

0.1814

0.0014

0.0532

0.4726

0.0734

0.6007

0.0000

0.0000

0.0296

0.0993

0.1290

Total

0.1041

0.1766

0.5466

0.1728

1.0000

Overallaccuracy:0.7484

JointDistribution:Observed/Observed
AchievementLevels

Observed Status:
Form 1

ObservedStatus:Form2
BB

Total

BB

0.0673

0.0310

0.0058

0.0000

0.1041

0.0310

0.0820

0.0632

0.0003

0.1766

0.0058

0.0632

0.4066

0.0709

0.5466

0.0000

0.0003

0.0709

0.1015

0.1728

Total

0.1041

0.1766

0.5466

0.1728

1.0000

Overallconsistency:0.6574

IndicesConditionaluponLevel
Proportionofstudentscorrectlyclassified,given
truelevel
Proportionofstudentsconsistentlyclassifiedby
parallelform,givenobservedlevel
Accuracy

Consistency

BB

0.7945

0.6466

0.5831

0.4645

0.7868

0.7439

0.7702

0.5876

IndicesatCutPoints
Accuracy&consistencyatspecifiedcutpoint
Accuracy:Whatisthechancethatastudentisclassified
onthecorrectsideofacutpoint?
Consistency:Whatisthechancethatastudentis
classifiedonthesamesideofacutpointtwice?
Accuracy

FalsePositive FalseNegative

Consistency

BB:B

0.9483

0.0183

0.0334

0.9264

B:P

0.9011

0.0443

0.0546

0.8612

P:A

0.8969

0.0734

0.0296

0.8575

Tosumup
Lotsofstuffintechnicalmanuals
Bothclassicaltesttheorymaterial(pvalues,
pointbiserials,reliabilities)&IRTmaterial
(TCCs,TIFs,equating)areimportantto
understand
Hopefully,theseseminarshavehelpedfamiliarize
youwiththeircontents

4.StandardSetting

ComesupallthetimeoutsidePsychoville
Shouldbeaperfecttopicforthistalk,but
LizandAbdullahalreadydidawonderfuljob

4.StandardSetting
StandardSettingistheprocessofrecommending
cutscoresbetweenachievementlevels

Advance(A)
Proficient(P)
BelowProficient(BP)
Failing(F)

Cutpoint3
Cutpoint2
Cutpoint1

FocusononeFAQinbookmark:
Howdowedeterminethearrangementofitemsinthe
ordereditembooklets?

BriefReviewofBookmark
Eachpanelistmakesuseoftheordereditembooklet
ItemsintheOIBarepresentedfromeasiesttohardest.
OnepageperMCitem
PanelistsjobistoplacebookmarkinOIBforeachcut
Foragivencut,wheredopanelistsplaceabookmark?
Wheretheythinkborderlinestudentswouldnolongerhavea2/3
chance(orbetter)ofacorrectanswer

Abdullahsaid:cutpointsarederivedfrombookmark
placements

AVeryFrequentlyAskedQuestion
First,aFMC:Youmesseduptheorderof
theitems!
Then,theFAQ:Well,howdidyou
determinetheorder?
Important:Orderisbasedonactualstudent
performance
WeusetheconceptofIRT

TwoMCitems:
Whichiseasier?
Easieritem
Harderitem

DependingonIRTModelthisissue
canbecomequitecomplex

AnIntuitiveExplanation
Aneasyitem:Anitemthatevenlowabilitystudentsget
rightahighproportionofthetime
Thatis,studentswithsmallthetavaluestendtogetitright
Whichitemhasthesmallestthetavaluecorrespondingtoa
highprobabilityofacorrectanswer?
Howhighaprobability?Use2/3forconsistency

INSUM:Easiestitemistheonewiththe
smallestthetacorrespondingtop=2/3
Hardesthaslargestthetacorrespondingtop=2/3

Usethe2/3Criterion

Easiesttohardest:Orange,green,red,purple,blue
Thetas:0.60.20.30.81.2

Howaboutpolytomousitems?
Apolytomousitemisonethathasmorethan2
possiblescores
MCitemsaredichotmous(0/1),notpolytomous
Exampleofpolytomous:ORitemscored0,1,2,3,4

SuchanORitemisintheOIBfourtimes,oncefor
eachscorepoint1,2,3,4
Wheredoyouputthisitems4pagesintheOIB?

Incorporatingpolytomousitems
Justaswithdichotmousitems,weuseIRT
Whatthetadoyouneedtohavea2/3chanceofgettinga1
orbetter?2orbetter?3orbetter?4?
Thethetamustincreaseasthescoreincreases
Supposetheresultsare:0.4,0.4,0.6,1.8
Easiesttohardest:Orange,green,red,purple,blue
Thetas:0.60.20.30.81.2
1

You might also like