Professional Documents
Culture Documents
Reliability Validity Utility HANDOUTS
Reliability Validity Utility HANDOUTS
Is a judgment about the appropriateness of bias is a factor inherent in a test that systematically
inferences drawn from test scores regarding prevents accurate, impartial measurement.
individual standings on a variable called a Test bias can arise more from the study design than
construct. A construct is an informed, scientific idea the test itself
developed or hypothesized to describe or explain
behavior. Rating error A rating is a numerical or verbal
judgment (or both) that places a person or an
Constructs are unobservable, presupposed attribute along a continuum identified by a scale of
(underlying) traits that a test developer may invoke numerical or word descriptors known as a rating
to describe test behavior or criterion performance. scale. Simply stated, a rating error is a judgment
- hypotheses about the expected behavior of resulting from the intentional or unintentional
high scorers and low scorers on the test misuse of a rating scale. Thus, for example, a
- hypotheses give rise to a tentative theory about leniency error (also known as a generosity error) is,
the nature of the construct the test was designed to as its name implies, an error in rating that arises
measure. from the tendency on the part of the rater to be
- If the test is a valid measure of the construct, lenient in scoring, marking, and/or grading.
then high scorers and low scorers will behave as
predicted by the theory. Severity error is a type of rating error in which the
ratings are consistently overly negative, particularly
- investigator will need to reexamine the nature with regard to the performance or ability of the
of the construct itself or hypotheses made about it participants. It is a type of error that can occur in
- the test simply does not measure the construct. psychometric assessments. Severity error is the
- One procedure may have been more opposite of leniency error, which is a type of rating
appropriate than another, given the particular error in which the ratings are consistently overly
assumptions. positive.
Although confirming evidence contributes to a
judgment that a test is a valid measure of a central tendency error
construct, evidence to the contrary can also be Here the rater, for whatever reason, exhibits a
useful. general and systematic reluctance to giving ratings
at either the positive or the negative extreme.
Consequently, all of this rater’s ratings would tend stringent application process for airline
to cluster in the middle of the rating continuum. personnel.
solution: • Benefits
use rankings o Profits, gains, advantages
o (e.g.) more stringent hiring policy more
Halo effect describes the fact that, for some raters, productive
some ratees can do no wrong. More specifically, a employees
halo effect may also be defined as a tendency to o (e.g.) maintaining successful and academic
give a particular ratee a higher rating than he or environment of university
she objectively deserves because of the rater’s
failure to discriminate among conceptually distinct UTILITY ANALYSIS
and potentially independent aspects of a ratee’s
behavior. What is Utility Analysis?
- a family of techniques that entail a cost-benefit
Test Fairness analysis designed to yield information relevant to a
Unlike the technically complex nature of test bias, division about the usefulness and/or practical value
concerns about test fairness are often tied to of a tool of assessment.
values. While test bias can be addressed with
precision, fairness is subjective and can lead to How Is a Utility Analysis Conducted?
ongoing debates among people with different objective: dictate what sort of information will be
viewpoints. In the context of psychometrics, required as well as the specific methods to be used
fairness is defined as the degree to which a test is
employed impartially, justly, and equitably. Expectancy Data
o Expectancy table provides indication of the
likelihood that a test taker will score within some
interval of scores on a criterion measure
o Used to measure costs vs. benefits
Brogden-Cronbach-Gleser formula
o Utility gain: estimate of the benefit of using a
particular test or selection method
CHAPTER 7: UTILITY o Most simply is benefits-cost
o Productivity gain: estimated increase in work
UTILITY: usefulness or practical value of testing to output
improve efficiency.
o Fixed cut score: set with reference to a Book-Mark method: test items are listed,
judgment concerning a minimum level of one per page, in ascending level of
proficiency required to be included in a difficulty. An expert places a bookmark to
particular classification. mark the divide which separates test
- Also called absolute cut scores takers who have acquired minimal
knowledge, skills, or abilities and those
o Multiple cut scores: using two or more that have not.
cut scores with reference to one predictor
for the purpose of categorizing test taker Problems include training of experts,
- (e.g.) having cut score that marks possible floor and ceiling effects, and the
an A, B, C etc. all measuring optimal length of item booklets
same predictor
Other Methods
o Multiple hurdles: for success, requires -discriminant analysis: family of statistical
one individual to complete many tasks, techniques used to shed light on the
with elimination at each level relationship between certain variables and
- (e.g.) written application -> group two or more naturally occurring groups
interview -> personal interview -(e.g.) the relationships between scores of
etc. tests and ppl judged to besuccessful or
unsuccessful at job
o Compensatory model of selection:
assumption is made that high scores on
one attribute can compensate for low
scores on another attribute.