Validity of Measure Semi

Measurement in nursing and health
research
Validity of measures
Presented by : Wesam Almagharbeh

Supervised by : Muayyad Ahmad, PhD, RN
Introduction Measurement
 The assignment of numbers to represent the
amount of an attribute present in an object or
person, using specific rules.
 L. L. Thurstone : “Whatever exists, exists in
some amount and can be measured.”
 The rules for measuring temperature, weight and
other physical attributes are widely known and
accepted.
 Rules for measuring many variables however
have to be invented, e.g,rule for measuring pain,
satisfaction and depression.
Measurement
 according to what criteria the numeric values are

to be assigned to the characteristic of interest
 In measuring attributes, researchers strive to
use good, meaningful rules.
 With a new instrument, researchers seldom
know in advance if their rules are the best
possible.
Key Criteria for Evaluating
Quantitative Measures
 Reliability
 Validity
Validity
 Validity refers to the extent to which a measure
achieves the purpose for which it was intended.
 "Validity is a unitary concept. It is the degree to

which evidence and theory support the
interpretation entailed by proposed use of tests"
(AERA) (NCME) (1985, 1999)
 The type of validity information to be obtained depends
upon the aims or purposes for the measure rather than
upon the type of measure.
Two framework of measurement
 Norm referenced measures are employed when the interest
is in evaluating a subject’s performance relative to the
performance of other subject in some well defined
comparison group
 The focus on the variance between subjects performance
 Criterion referenced measures are employed when the

interest is in determining a subject’s performance relative
to/ or whether or not subject has acquired a predetermined
set of target behavior
 The focus on the variance between subject performance
and predetermined set of behavior, (process and outcome
variable)
NORM-REFERENCED
VALIDITY PROCEDURES
 Four aspects ;
 Content validity
 Face “logical” validity
 Construct validity
 Criterion-related validity
NORM-REFERENCED MEASURES
Content validity
 Its focus on
 Determining whether or not the items sampled for
inclusion on the tool adequately represent the domain of
content addressed by the instrument
 The relevance of the content domain to the proposed
interpretation of scores obtained when the measure is
employed. Sh
ee
t
g ely
tin c tiv
 Important for all measures

.
Ra ime
effe
c tiv
ely
nd
et effe na
ag es
an r c atio
M ou rm
1 e res f info t.
5 g o n
Ma
na e
itud impo
rta ks . ific
.
4 ult tas ec
a m hat is le t sp
3 2 ltip no
2 5 an
Sc ide w mu are
age s
(especially instruments
4 c n n
1 3 de ma io
3 to ec t
ow dir
2 5 eh en ide
cid wh ec
1 4
De w ork dd
3 e an
54 e th on
2
g a niz t iv ely rm ati
1 4 Or c o
ffe inf
3 ee of
2 55 tim de
age ltitu nt.
1 4
M an a mu orta
3 p
an im
51 Sc at is
designed to assess cognition )

2
1 4 3 wh
3
2 5
1 4
3
2
1
Content validity
 Procedures : experts judge the specific items in terms of
their relevance, sufficiency, and clarity in representing the
concepts underlying the measure's development.
 When two judges are employed, the content validity index
(CVI) is used (proportion of items given a rating of
quite/very relevant by both raters )
 When more than two experts rate the items on a
measure, the alpha coefficient is used
 0 indicates lack of agreement
 1.00 indicates complete agreement
Content validity
 Content validity depend largely on
Selection, preparation, and use of experts
Optimal number of experts
Face “logical” validity
 Face validity is not validity in the true sense and refers
only to the appearance of the instrument to the layman
 When it is present, does not provide evidence for

validity, that the instrument actually measures what it
purports to measure
Construct validity
 Refers to the extent to which an individual ,event , object
actually possesses the characteristics being measured
by the instrument
 The primary concern is the extent to which relationships
among items included in the measure are consistent with
the theory and concepts as operationally defined.
 The more abstract the concept, the more difficult it is to
establish the construct validity of the measure.
Construct validity
 Activities undertaken to obtain evidence for construct validity
include:
 examining item interrelationships
 investigations of the type and extent of the relationship between
scores and external variables
 studies of the relationship between scores and other tools or
methods intended to measure the same concepts y
 examining relationships between scores and other measures of
different constructs
 hypothesis testing of effects of specific interventions on scores
 comparison of scores of known groups of respondents
 testing hypotheses about expected differences in scores across
specific groups of respondents
 ascertaining similarities and differences in responses given by
members of distinct subgroups of respondents
Some Methods of Assessing
Construct Validity
 Contrasted groups approach

 hypothesis testing approach
 Multitrait-multi-method approach
Construct validity
Contrasted groups approach
 Instrument is administered to groups expected to differ on the critical
attribute because of some known characteristic ( be extremely high
and extremely low in the characteristic being measured )
 E.g ; fear of labor experiences between primipara and multipara
 If a significant difference between the mean scores :

 evidence for construct validity
 If no significant difference three possibilities exist:

 (1) the test is unreliable;
 (2) the test is reliable, but not a valid measure of the characteristic
 (3) the constructor's conception of the construct of interest is faulty and
needs reformulation. the characteristic
Construct validity
hypothesis testing approach
 Hypotheses according to theory or conceptual

framework
 gathers data to test the hypotheses,
 rationale underlying the instrument's
construction is adequate to explain the data
collected.
Construct validity
hypothesis testing approach
 According to theory, construct X is positively

related to construct Y.
 Instrument A is a measure of construct X;
instrument B is a measure of construct Y.
 Scores on A and B are correlated positively, as
predicted by theory.
 Therefore, it is inferred that A and B are valid
measures of X and Y.
Construct validity
Multitrait-multi-method approach
 Is appropriately employed whenever it is feasible
to :
1. Measure two or more different constructs
2. Use two or more different methodologies to
measure each construct
3. Administer all instruments to every subject at
the same time
4. Assume that performance on each instrument
employed is independent that is not influenced
by, biased by or a function of performance on
any other instrument
Construct validity
Multitrait-multimethod approach
 Depend largely on the correlation size and pattern of
 Trait variance is the variability in a set of scores resulting

from individual differences in the trait being measured.
 Method variance is variance resulting from individual

differences in a subject's ability to respond appropriately
to the type of measure used
Construct validity
Multitrait-multimethod approach
 The reliability estimate (reliability diagonal )
 Convergent validity (validity diagonal )
 The size of these heterotrait-monomethod coefficients

will be lower than the values on the validity diagonal
(construct validity)
 The values of these heterotrait-heteromethod coeffi

cients should be lower than the values in the validity
diagonal (discriminant validity )
Construct validity
 CONFIRMATORY FACTOR
ANALYSIS
Criterion-related validity
 When one wishes to infer from a measure an individual's

probable standing on some other variable or criterion,
criterion-related validity is of concern
 The degree to which the instrument is related to an
external criterion
 Check the measure against a relevant criterion.
 two types of criterion-related validity :

 Predictive validity indicates the extent to which an
individual's future level of performance on a criterion can
be predicted from knowledge of performance on a prior
measure.
 Concurrent validity refers to the extent to which a
measure may be used to estimate an individual's present
standing on the criterion.
NORM-REFERENCED
MEASURES
Criterion-related
validity
Predictive
Predictive Validity
Validity
 Look at measure’s ability to predict something

it should be able to predict
et
he
gS ely
tin c tiv at
Ra e ti
me
effe
ecti
ve
ly.
eci
de
wh
nag eff dd
Ma es n an
ourc atio
1 e res form
5 nag eo
f in
Ma tud s.
4 ti sk c.
2 mul ta cifi
3 n a tant. iple spe
5 Sca por ult not
2 em are
1 4 3 is im nag ns
ma c tio
3 w to dire
2 5 e ho h en t is
id w ha
1 4 Dec wo
rk ew
the c id
3 4 de
ze nd
2 5 ani na
4 Org ti ve
ly atio
1
3 5 ef fec form
e f in
2 5 tim eo
ge tud
4 na ulti
1 Ma a m t.
3 1 an
2 5 Sc portan
4 3 im
1
3
2 5
1 4
3
2
1
29
Test Criterion
NORM-REFERENCED
MEASURES
Criterion-related
validity
Concurrent
Concurrent Validity
Validity
 a measure of empowerment should show

higher scores for managers and lower
scores for their workers.
31
 The difference between predictive and

concurrent validity then, is the difference in
the timing of obtaining measurements on a
criterion.
Activities to obtain evidence for criterion-related validity
 correlation studies of the type and extent of the
relationships between scores and external variables
 studies of the extent to which scores predict future

behavior, performance, or scores on measures obtained at
a later point in time
 studies of the effectiveness of selection, placement, and/or

classification decisions on the basis of the scores resulting
from the measure
 studies of differential group predictions or relationships
 assessment of validity generalization

 Factors to be considered in planning and

interpreting criterion-related studies relate
to
 (1) the target population,
 (2) the sample,
 (3) the criterion,
 (4) measurement reliability,
 (5) the !need for a cross validation
NORM-REFERENCED
ITEM-ANALYSIS PROCEDURES
 Item analysis : procedure used to further assess the

validity of a measure by separately evaluating each item
to determine whether or not that item discriminates in the
same manner in which the overall measure is intended
to discriminate
 Three item-analysis procedures are :

 (1) item p level
 (2) discrimination index
 (3) item-response chart.
NORM-REFERENCED
Item p level
 The p level (the difficulty level) : is the proportion of
correct responses to that item.
 It is determined by counting the number of subjects
selecting the correct or desired response to a particular
item and then dividing this number by the total number of
subjects
 The closer the value of p is to 1.00, the easier the item
 the closer p is to zero, the more difficult the item
 p levels between 0.30 and 0.70 are desirable
 extremely easy or extremely difficult items have; very
little power to discriminate or differentiate among
subjects
NORM-REFERENCED
Discrimination Index
 The discrimination index (D) assesses an item's ability to

discriminate
 if performance on a given item is a good predictor of

performance on the overall measure, the item is said to
be a good discriminator
NORM-REFERENCED
 To determine the D value for a given item:
1. Rank all subjects' performance on the measure by using total scores

from high to low.
2. Identify those individuals who ranked in the upper 25%.
3. Identify those individuals who ranked in the lower 25%.
4. Place the remaining scores aside.
5. Determine the proportion of respondents in the top 25% who
answered the item correctly (P u)
6. Determine the proportion of respondents in the lower 25% who
answered the item correctly (PL)
7. Calculate D by subtracting PL from P u
8. Repeat steps 5 through 7 for each item on the measure

NORM-REFERENCED
 D values range from -1.00 to +1.00.

 D values. greater than +0.20 are desirable for a norm-
referenced measure
 A positive D value is desirable and indicates that the
item is discriminating in the same manner as the total
test
 A negative D value suggests that the item is not
discriminating in the same way as the total test
NORM-REFERENCED
Item Response Chart
 Like D, the item-response chart assesses an item's
ability to discriminate
 The respondents ranking in the upper and lower 25% are
identified as in steps 1 through 4 for determining D
 the two categories, high/low scorers and
correct/incorrect for a given item.
 Chi square ; a value as large as or larger than 1.84 for a
chi square with one degree of freedom is significant at
the 0.05 level
 Mean a significant difference exists in the proportion of
high and low scorers who have correct responses. Items
that meet this criterion should be retained, while those
that do not should be discarded or modified to improve
their ability ' to discriminate.
CRITERION-REFERENCED
VALIDITY ASSESSMENT
 The validity of a criterion-referenced measure can be

analyzed to ascertain if the measure functions in a
manner consistent with its purposes
 validity in terms of criterion-referenced interpretations

relates to the extent to which scores result in the
accurate classification of objects in regard to their
domain status.
VALIDITY ASSESSMENT
 Three aspects ;
 Content validity
 Construct validity
 Criterion-related validity
VALIDITY ASSESSMENT
Content Validity
 Focus on the representativeness of a

cluster of items in relation to the specified
content domain
 For a measure to provide a clear description of domain
status, the content domain must be consistent with its
domain specifications or objective
 prerequisite for all other types of validity
 a posteriori content validity approach in criterion-
referenced measurement uses content specialists to
assess the quality and representativeness of the items
within the test for measuring the content domain.
Validity Assessment
by Content Specialists
 specialists should be conversant with the domain treated

in the measuring tool.
 two or more content specialists are employed
 item-objective congruence measure (item level)
 if more than one objective is used for a measure, the

items that are measures of each objective usually are
treated as separate tests when interpreting the results of
validity assessments
Validity Assessment
 Determination of Interrater Agreement

 Average Congruent/ Percentage
Validity Assessment
Determination of Interrater Agreement
 Content specialists are provided with the conceptual

definition of the variable (s) to be measured with the set
of items
 The content specialists then independently rate the

relevance of each item to the specified content domain
 P ≥ 0.80,
0
 K ≥ 0.25.
 The index of content validity (CVI)

Validity Assessment
 If P and K or either of these values is too low, one or a
0
combination of two problems could be operating ;
 First, items lack homogeneity , ambiguous or is not well

defined.
 E.g. (20 out of 30), 0.50 (15 out of 30), and 0.60 (18 out of 30).
 the majority of the item writers had at least one item that
was judged not/somewhat relevant (1 or 2) by the three
content specialists, then this would be support for lack of
clarity in the domain definition.

Validity Assessment
 Second, the problem due to the raters , interpret the

rating scale labels differently or used the rating scale
differently
 E.g. 0.90 (27 out of 30), 0.93 (28 out of 30), and 0.93 (28
out of 30).
 Each of the items judged to be unlike the rest had been

prepared by one item writer. In this case the flaw is not
likely to be in the domain definition as specified, but in
the interpretations of one item writer.
Validity Assessment
 Refinement of the domain specifications is required if the

first case.
 If the latter is the problem, the raters are given more
explicit directions and guidelines in the use of the scale
to reduce the chance of differential use.
 A clear and precise domain definition
 domain specifications function to communicate what the
results of measurements mean to those people who
must interpret them,
 what types of items and content should be included in
the measure to those people who must construct the
items.
Validity Assessment
Average Congruent/ Percentage
 Content specialists are judge the congruence of each item
on a measure
 The proportion of items rated congruent by each judge is

calculated and converted to a percentage.
 Then the mean percentage for all judges is calculated to

obtain the average congruency percentage.
 E.g. if the percentages of congruent items for the judges

are 95,90,100, and 100%, the average congruency
percentage would be 96.25%.
 percent ≥ 90 safely considered acceptable

CONSTRUCT VALIDITY
 Evidence of the content validity is not guarantee that the
measure is useful for its intended purpose.
 "we may say that a test's results are accurately

descriptive of the domain of behaviors it is supposed to
measure, it is quite another thing to say that the function
to which you wish to put a descriptively valid test is
appropriate" (Popham, 1978, p. 159).
 the major focus of construct validation is to

 establish support for the measure's ability to accurately
categorize phenomena in accordance with the purpose
for which the measure being used.
CONSTRUCT VALIDITY
Approaches used to assess the construct validity
 Experimental Methods and the Contrasted Groups

Approach
 Decision Validity
CONSTRUCT VALIDITY
Experimental Methods and the Contrasted Groups
Approach
 The basic principles and procedures for

these two approaches are the same for
criterion-referenced measures as for
norm-referenced measures.
CONSTRUCT VALIDITY
Decision Validity
 (1) a student may be allowed to progress to the next unit

of instruction if test results indicate that the preceding unit
has been mastered.
 (2) a woman in early labor may be allowed to ambulate if

the nurse assesses, on pelvic examination, that the fetal
head is engaged (as opposed to unengaged) in the pelvis.
 (3) a diabetic patient may be allowed to go home if the

necessary skills for self-care have been mastered
CONSTRUCT VALIDITY
Decision Validity
 The measurements obtained from criterion-referenced

measures are often used to make decisions.
 "Criterion-referenced tests have emerged as instruments

that provide data via which mastery decisions can be
made, as opposed to providing the decision itself
(Hashway, 1998, p. 112).
 The decision validity of a measure is supported when the

set standard (s) or criterion classifies subjects or objects
with a high level of confidence.
CONSTRUCT VALIDITY
Decision Validity
 In most instances, two criterion groups are used to test

the decision validity of a measure (low and high )
 E.g.
 "by summing the percentage of who exceed the
performance standard and the percentage who did not"
 decision validity can range from 0 to 100%, with high
percentages reflecting high decision validity.
 Criterion groups for testing the decision validity of a
measure also can be created
 E.g.
CONSTRUCT VALIDITY
Decision Validity
 Decision validity is influenced by

the quality of the measure
appropriateness of the criterion groups
the characteristics of the subjects
the level of performance or cut-score
required.
Criterion-Related Validity
 Criterion-related validity studies of

criterion-referenced measures are
conducted in the same manner as for
norm-referenced measures
 content specialists' ratings holds the most

merit for assessing item validities for
determining which items should be
retained or discarded
 empirical item-discrimination indices

should be used primarily to detect aberrant
items in need of revision or correction
Empirical Item-Analysis
Procedures
 Criterion-referenced item-analysis procedures determine
the effectiveness of a specific test item to discriminate
subjects who have acquired the target behavior and
those who have not.
 Two approaches are used for item analysis
procedures
 (1) the criterion-groups technique, which also
may be referred to as the uninstructed-instructed
groups approach
 (2) pretreatment/post-treatment measures
approach, which in appropriate instances may
be called the preinstruction/postinstruction
measurements approach.
Advantage and disadvantage
 The criterion-groups technique is highly practical
 difficulty of defining criteria for identifying
groups. Another is the requirement of
equivalence of groups
 Pretreatment/post-treatment measures approach

allowing analysis of individual as well as group
gains.
 impracticality , the amount of time that may be
required, potential problem with testing effect,
 Three item-analysis procedures

are :
 (1) Item-Objective or Item-Subscale Congruence
 (2) Item Difficulty
 (3) discrimination index
Item-Objective or
Item-Subscale Congruence
 provides an index of the validity of an item based on the
ratings of two or more content specialists
 In this method content specialists are directed to assign a
value of+1,0, or -1 for each item
 an item definitely measure the objective or subscale, a

value of +1 is assigned.
 A rating of 0 indicates that the judge is undecided about

the item.
 The assignment of a -1 rating reflects a definite judgment

that the item is not a measure of the objective or sub-
scale.
Item-Objective or
 The limits of the index range from -1.00 to +1.00.
 An index of +1.00 will occur when perfect positive item-

objective or subscale congruence exists, that is, when all
content specialists assign a +1 to the item for its related
objective or subscale and a —1 to the item for all other
objectives or subscales that are measured by the tool.
 An index of -1.00 represents the worst possible value of

the index and occurs when all content specialists assign a
-1 to the item for what was expected to be its related
objective or subscale and a +1 to the item for all other
objectives or subscales.
Item-Objective or
does not depend on the number of content specialists used
or on the number of objectives measured by the test or
questionnaire.
 the tool must include more than one objective or
subscale in order for this procedure to be used.
 cutoff score derived by the test developer.
 done by creating the poorest set of content specialists'

ratings
 Below cutoff score ; nonvalid; discarded from the

measure or analyzed and revised to Improve their
validity.
 above cutoff score are considered valid.
Item Difficulty
 the purpose is to examine the difficulty level of items and

compare them between criterion groups
 The approaches to calculating item p levels and their

interpretation was discussed
 The item p level should be higher for the group that is

known to possess more of a specified trait or attribute
than for the group known to possess less
Item Discrimination
 The focus on the measurement of performance changes

(e.g., pretest/posttest) or differences (e.g.,
experienced/inexperienced) between the criterion
groups.
 referred to as D‘
 is directly related to the property of decision validity,

 Items with high positive discrimination indices improve
the decision validity of a test.
Item Discrimination
 Criterion groups difference index (CGDI)

 Pre/post treatment measurements
approach indices
Item Discrimination
 criterion groups difference index (CGDI) is

the proportion of respondents in the group
known to have less of the trait or attribute
of interest who answered the item
appropriately or correctly subtracted from
the proportion of respondents in the group
known to possess more of the trait or
attribute of interest who answered it
correctly.
Item Discrimination
 Pretreatment/post treatment measurements approach
 Three item-discrimination indices are
 (1) pretest/posttest difference.

 (2) individual gain.
 (3) net gain.
Item Discrimination
 The pretest/posttest difference index (PPDI) is the
proportion of respondents who answered the item
correctly on the posttest minus the proportion who
responded to the item correctly on the pretest
 The individual gain index (IGI) is the proportion of

respondents who answered the item incorrectly on the
pretest and correctly on the posttest
 The net gain index (NGI) is the proportion of respondents

who answered the item incorrectly on both occasions
subtracted from the IGI.
Item Discrimination
 NGI provides the most conservative estimate of item

discrimination and uses more information.
 The range of values for each of the indices discussed

above is -1.00 to +1.00
 except for IGI, which has a range of 0 to +1.00.
 A high positive index for each of these item

discrimination indices is desirable.

Validity of Measure Semi

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Validity of Measure Semi

Uploaded by

Copyright:

Available Formats

Measurement in nursing and health

Presented by : Wesam Almagharbeh

 according to what criteria the numeric values are

 "Validity is a unitary concept. It is the degree to

 Criterion referenced measures are employed when the

 Important for all measures

designed to assess cognition )

 When it is present, does not provide evidence for

 Contrasted groups approach

 E.g ; fear of labor experiences between primipara and multipara

 If a significant difference between the mean scores :

 If no significant difference three possibilities exist:

 Hypotheses according to theory or conceptual

 According to theory, construct X is positively

 Depend largely on the correlation size and pattern of

 Trait variance is the variability in a set of scores resulting

 Method variance is variance resulting from individual

 Convergent validity (validity diagonal )

 The size of these heterotrait-monomethod coefficients

 The values of these heterotrait-heteromethod coeffi­

 When one wishes to infer from a measure an individual's

 two types of criterion-related validity :

 Look at measure’s ability to predict something

 a measure of empowerment should show

 The difference between predictive and

 studies of the extent to which scores predict future

 studies of the effectiveness of selection, placement, and/or

 studies of differential group predictions or relationships

 assessment of validity generalization

 Factors to be considered in planning and

 Item analysis : procedure used to further assess the

 Three item-analysis procedures are :

 The discrimination index (D) assesses an item's ability to

 if performance on a given item is a good predictor of

1. Rank all subjects' performance on the measure by using total scores

8. Repeat steps 5 through 7 for each item on the measure

 D values range from -1.00 to +1.00.

 The validity of a criterion-referenced measure can be

 validity in terms of criterion-referenced interpretations

 Focus on the representativeness of a

 specialists should be conversant with the domain treated

 two or more content specialists are employed

 item-objective congruence measure (item level)

 if more than one objective is used for a measure, the

 Determination of Interrater Agreement

 Content specialists are provided with the conceptual

 The content specialists then independently rate the

 The index of content validity (CVI)

combination of two problems could be operating ;

 First, items lack homogeneity , ambiguous or is not well

 Second, the problem due to the raters , interpret the

 Each of the items judged to be unlike the rest had been

 Refinement of the domain specifications is required if the

 The proportion of items rated congruent by each judge is

 Then the mean percentage for all judges is calculated to

 E.g. if the percentages of congruent items for the judges

 percent ≥ 90 safely considered acceptable

 "we may say that a test's results are accurately

 the major focus of construct validation is to

 Experimental Methods and the Contrasted Groups

 The basic principles and procedures for

 (1) a student may be allowed to progress to the next unit

 (2) a woman in early labor may be allowed to ambulate if

 The values of these heterotrait-heteromethod coeffi

 cutoff score derived by the test developer.

 Below cutoff score ; nonvalid; discarded from the

 The individual gain index (IGI) is the proportion of