Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 77

Measurement in nursing and health

research
Validity of measures

Presented by : Wesam Almagharbeh


Supervised by : Muayyad Ahmad, PhD, RN
Introduction Measurement
 The assignment of numbers to represent the
amount of an attribute present in an object or
person, using specific rules.
 L. L. Thurstone : “Whatever exists, exists in
some amount and can be measured.”
 The rules for measuring temperature, weight and
other physical attributes are widely known and
accepted.
 Rules for measuring many variables however
have to be invented, e.g,rule for measuring pain,
satisfaction and depression.
Measurement

 according to what criteria the numeric values are


to be assigned to the characteristic of interest
 In measuring attributes, researchers strive to
use good, meaningful rules.
 With a new instrument, researchers seldom
know in advance if their rules are the best
possible.
Key Criteria for Evaluating
Quantitative Measures

 Reliability
 Validity
Validity
 Validity refers to the extent to which a measure
achieves the purpose for which it was intended.

 "Validity is a unitary concept. It is the degree to


which evidence and theory support the
interpretation entailed by proposed use of tests"
(AERA) (NCME) (1985, 1999)
 The type of validity information to be obtained depends
upon the aims or purposes for the measure rather than
upon the type of measure.
Two framework of measurement
 Norm referenced measures are employed when the interest
is in evaluating a subject’s performance relative to the
performance of other subject in some well defined
comparison group
 The focus on the variance between subjects performance

 Criterion referenced measures are employed when the


interest is in determining a subject’s performance relative
to/ or whether or not subject has acquired a predetermined
set of target behavior
 The focus on the variance between subject performance
and predetermined set of behavior, (process and outcome
variable)
NORM-REFERENCED
VALIDITY PROCEDURES

 Four aspects ;
 Content validity
 Face “logical” validity
 Construct validity
 Criterion-related validity
NORM-REFERENCED MEASURES
Content validity
 Its focus on
 Determining whether or not the items sampled for
inclusion on the tool adequately represent the domain of
content addressed by the instrument
 The relevance of the content domain to the proposed
interpretation of scores obtained when the measure is
employed. Sh
ee
t
g ely
tin c tiv

 Important for all measures


.
Ra ime
effe
c tiv
ely
nd
et effe na
ag es
an r c atio
M ou rm
1 e res f info t.
5 g o n
Ma
na e
itud impo
rta ks . ific
.
4 ult tas ec
a m hat is le t sp
3 2 ltip no
2 5 an
Sc ide w mu are
age s

(especially instruments
4 c n n
1 3 de ma io
3 to ec t
ow dir
2 5 eh en ide
cid wh ec
1 4
De w ork dd
3 e an
54 e th on
2
g a niz t iv ely rm ati
1 4 Or c o
ffe inf
3 ee of
2 55 tim de
age ltitu nt.
1 4
M an a mu orta
3 p
an im
51 Sc at is

designed to assess cognition )


2
1 4 3 wh
3
2 5
1 4
3
2
1
NORM-REFERENCED MEASURES
Content validity
 Procedures : experts judge the specific items in terms of
their relevance, sufficiency, and clarity in representing the
concepts underlying the measure's development.
 When two judges are employed, the content validity index
(CVI) is used (proportion of items given a rating of
quite/very relevant by both raters )
 When more than two experts rate the items on a
measure, the alpha coefficient is used
 0 indicates lack of agreement
 1.00 indicates complete agreement
NORM-REFERENCED MEASURES
Content validity
 Content validity depend largely on
Selection, preparation, and use of experts
Optimal number of experts
NORM-REFERENCED MEASURES
Face “logical” validity
 Face validity is not validity in the true sense and refers
only to the appearance of the instrument to the layman

 When it is present, does not provide evidence for


validity, that the instrument actually measures what it
purports to measure
NORM-REFERENCED MEASURES
Construct validity
 Refers to the extent to which an individual ,event , object
actually possesses the characteristics being measured
by the instrument
 The primary concern is the extent to which relationships
among items included in the measure are consistent with
the theory and concepts as operationally defined.
 The more abstract the concept, the more difficult it is to
establish the construct validity of the measure.
NORM-REFERENCED MEASURES
Construct validity
 Activities undertaken to obtain evidence for construct validity
include:
 examining item interrelationships
 investigations of the type and extent of the relationship between
scores and external variables
 studies of the relationship between scores and other tools or
methods intended to measure the same concepts y
 examining relationships between scores and other measures of
different constructs
 hypothesis testing of effects of specific inter­ventions on scores
 comparison of scores of known groups of respondents
 testing hypotheses about expected differ­ences in scores across
specific groups of respondents
 ascertaining similarities and differences in responses given by
members of distinct subgroups of respondents
Some Methods of Assessing
Construct Validity

 Contrasted groups approach


 hypothesis testing approach
 Multitrait-multi-method approach
NORM-REFERENCED MEASURES
Construct validity
Contrasted groups approach
 Instrument is administered to groups expected to differ on the critical
attribute because of some known characteristic ( be extremely high
and extremely low in the characteristic being measured )

 E.g ; fear of labor experiences between primipara and multipara

 If a significant difference between the mean scores :


 evidence for construct validity

 If no significant difference three possibilities exist:


 (1) the test is unreliable;
 (2) the test is reliable, but not a valid measure of the characteristic
 (3) the constructor's con­ception of the construct of interest is faulty and
needs reformulation. the characteristic
NORM-REFERENCED MEASURES
Construct validity
hypothesis testing approach

 Hypotheses according to theory or conceptual


framework
 gathers data to test the hypotheses,
 rationale underlying the instrument's
construction is adequate to explain the data
collected.
NORM-REFERENCED MEASURES
Construct validity
hypothesis testing approach

 According to theory, construct X is positively


related to construct Y.
 Instrument A is a measure of construct X;
instrument B is a measure of construct Y.
 Scores on A and B are correlated positively, as
predicted by theory.
 Therefore, it is inferred that A and B are valid
measures of X and Y.
NORM-REFERENCED MEASURES
Construct validity
Multitrait-multi-method approach
 Is appropriately employed whenever it is feasible
to :
1. Measure two or more different constructs
2. Use two or more different methodologies to
measure each construct
3. Administer all instruments to every subject at
the same time
4. Assume that performance on each instrument
employed is independent that is not influenced
by, biased by or a function of performance on
any other instrument
NORM-REFERENCED MEASURES
Construct validity
Multitrait-multimethod approach

 Depend largely on the correlation size and pattern of

 Trait variance is the variability in a set of scores resulting


from individual differences in the trait being measured.

 Method variance is variance resulting from individual


differences in a subject's ability to respond appropriately
to the type of measure used
NORM-REFERENCED MEASURES
Construct validity
Multitrait-multimethod approach
 The reliability estimate (reliability diagonal )

 Convergent validity (validity diagonal )

 The size of these heterotrait-monomethod coefficients


will be lower than the values on the validity diagonal
(construct validity)

 The values of these heterotrait-heteromethod coeffi­


cients should be lower than the values in the validity
diagonal (discriminant validity )
NORM-REFERENCED MEASURES
Construct validity

 CONFIRMATORY FACTOR
ANALYSIS
NORM-REFERENCED MEASURES
Criterion-related validity

 When one wishes to infer from a measure an individual's


probable standing on some other variable or criterion,
criterion-related validity is of concern
 The degree to which the instrument is related to an
external criterion
 Check the measure against a relevant criterion.
NORM-REFERENCED MEASURES
Criterion-related validity

 two types of criterion-related validity :


 Predictive validity indicates the extent to which an
individual's future level of performance on a criterion can
be predicted from knowledge of performance on a prior
measure.
 Concurrent validity refers to the extent to which a
measure may be used to estimate an individual's present
standing on the criterion.
NORM-REFERENCED
NORM-REFERENCED MEASURES
MEASURES
Criterion-related
Criterion-related validity
validity
Predictive
Predictive Validity
Validity

 Look at measure’s ability to predict something


it should be able to predict

et
he
gS ely
tin c tiv at
Ra e ti
me
effe
ecti
ve
ly.
eci
de
wh

nag eff dd
Ma es n an
ourc atio
1 e res form
5 nag eo
f in
Ma tud s.
4 ti sk c.
2 mul ta cifi
3 n a tant. iple spe
5 Sca por ult not
2 em are
1 4 3 is im nag ns
ma c tio
3 w to dire
2 5 e ho h en t is
id w ha
1 4 Dec wo
rk ew
the c id
3 4 de
ze nd
2 5 ani na
4 Org ti ve
ly atio
1
3 5 ef fec form
e f in
2 5 tim eo
ge tud
4 na ulti
1 Ma a m t.
3 1 an
2 5 Sc portan
4 3 im
1
3
2 5
1 4
3
2
1

29
Test Criterion
NORM-REFERENCED
NORM-REFERENCED MEASURES
MEASURES
Criterion-related
Criterion-related validity
validity
Concurrent
Concurrent Validity
Validity

 a measure of empowerment should show


higher scores for managers and lower
scores for their workers.

31
NORM-REFERENCED MEASURES
Criterion-related validity

 The difference between predictive and


concurrent validity then, is the difference in
the timing of obtaining measurements on a
criterion.
NORM-REFERENCED MEASURES
Criterion-related validity
Activities to obtain evidence for criterion-related validity
 correlation studies of the type and extent of the
relationships between scores and exter­nal variables

 studies of the extent to which scores predict future


behavior, performance, or scores on measures obtained at
a later point in time

 studies of the effectiveness of selection, placement, and/or


classification decisions on the basis of the scores resulting
from the measure

 studies of differential group predictions or relationships

 assessment of validity generalization


NORM-REFERENCED MEASURES
Criterion-related validity

 Factors to be considered in planning and


interpreting criterion-related studies relate
to
 (1) the target population,
 (2) the sample,
 (3) the criterion,
 (4) measurement reliability,
 (5) the !need for a cross validation
NORM-REFERENCED
ITEM-ANALYSIS PROCEDURES

 Item analysis : procedure used to further assess the


validity of a measure by separately evaluating each item
to determine whether or not that item discriminates in the
same manner in which the overall measure is intended
to dis­criminate

 Three item-analysis procedures are :


 (1) item p level
 (2) discrimination index
 (3) item-response chart.
NORM-REFERENCED
ITEM-ANALYSIS PROCEDURES
Item p level
 The p level (the difficulty level) : is the proportion of
correct responses to that item.
 It is determined by counting the number of subjects
selecting the correct or desired response to a particular
item and then dividing this number by the total number of
subjects
 The closer the value of p is to 1.00, the easier the item
 the closer p is to zero, the more difficult the item
 p levels between 0.30 and 0.70 are desirable
 extremely easy or extremely difficult items have; very
little power to discriminate or differentiate among
subjects
NORM-REFERENCED
ITEM-ANALYSIS PROCEDURES
Discrimination Index

 The discrimination index (D) assesses an item's ability to


discriminate

 if performance on a given item is a good predictor of


performance on the overall measure, the item is said to
be a good discriminator
NORM-REFERENCED
ITEM-ANALYSIS PROCEDURES
Discrimination Index
 To determine the D value for a given item:

1. Rank all subjects' performance on the measure by using total scores


from high to low.
2. Identify those individuals who ranked in the upper 25%.
3. Identify those individuals who ranked in the lower 25%.
4. Place the remaining scores aside.
5. Determine the proportion of respondents in the top 25% who
answered the item correctly (P u)
6. Determine the proportion of respondents in the lower 25% who
answered the item correctly (PL)
7. Calculate D by subtracting PL from P u

8. Repeat steps 5 through 7 for each item on the measure


NORM-REFERENCED
ITEM-ANALYSIS PROCEDURES
Discrimination Index

 D values range from -1.00 to +1.00.


 D values. greater than +0.20 are desirable for a norm-
referenced measure
 A positive D value is desirable and indicates that the
item is discriminating in the same manner as the total
test
 A negative D value suggests that the item is not
discriminating in the same way as the total test
NORM-REFERENCED
ITEM-ANALYSIS PROCEDURES
Item Response Chart
 Like D, the item-response chart assesses an item's
ability to discriminate
 The respondents ranking in the upper and lower 25% are
identified as in steps 1 through 4 for determining D
 the two categories, high/low scorers and
correct/incorrect for a given item.
 Chi square ; a value as large as or larger than 1.84 for a
chi square with one degree of freedom is significant at
the 0.05 level
 Mean a significant difference exists in the proportion of
high and low scorers who have correct responses. Items
that meet this criterion should be retained, while those
that do not should be discarded or modified to improve
their ability ' to discriminate.
CRITERION-REFERENCED
VALIDITY ASSESSMENT

 The validity of a criterion-referenced measure can be


analyzed to ascertain if the measure functions in a
manner consistent with its purposes

 validity in terms of criterion-referenced interpretations


relates to the extent to which scores result in the
accurate classification of objects in regard to their
domain status.
CRITERION-REFERENCED
VALIDITY ASSESSMENT

 Three aspects ;
 Content validity
 Construct validity
 Criterion-related validity
CRITERION-REFERENCED
VALIDITY ASSESSMENT
Content Validity

 Focus on the representativeness of a


cluster of items in relation to the specified
content domain
 For a measure to provide a clear description of domain
status, the content domain must be consistent with its
domain specifications or objective
 prerequisite for all other types of validity
 a posteriori content validity approach in criterion-
referenced measurement uses content specialists to
assess the quality and representativeness of the items
within the test for measuring the content domain.
CRITERION-REFERENCED
Validity Assessment
by Content Specialists

 specialists should be conversant with the domain treated


in the measuring tool.

 two or more content specialists are employed

 item-objective congruence measure (item level)

 if more than one objective is used for a measure, the


items that are meas­ures of each objective usually are
treated as separate tests when interpreting the results of
validity assessments
CRITERION-REFERENCED
Validity Assessment
by Content Specialists

 Determination of Interrater Agreement


 Average Congruent/ Percentage
Validity Assessment
by Content Specialists
Determination of Interrater Agreement

 Content specialists are provided with the conceptual


definition of the variable (s) to be measured with the set
of items

 The content specialists then independently rate the


relevance of each item to the specified content domain

 P ≥ 0.80,
0

 K ≥ 0.25.

 The index of content validity (CVI)


Validity Assessment
by Content Specialists
Determination of Interrater Agreement
 If P and K or either of these values is too low, one or a
0

combination of two problems could be operating ;

 First, items lack homogeneity , ambiguous or is not well


defined.

 E.g. (20 out of 30), 0.50 (15 out of 30), and 0.60 (18 out of 30).

 the majority of the item writers had at least one item that
was judged not/somewhat relevant (1 or 2) by the three
content specialists, then this would be support for lack of
clarity in the domain definition.

Validity Assessment
by Content Specialists
Determination of Interrater Agreement

 Second, the problem due to the raters , interpret the


rating scale labels differently or used the rating scale
differently

 E.g. 0.90 (27 out of 30), 0.93 (28 out of 30), and 0.93 (28
out of 30).

 Each of the items judged to be unlike the rest had been


prepared by one item writer. In this case the flaw is not
likely to be in the domain definition as specified, but in
the interpretations of one item writer.
Validity Assessment
by Content Specialists
Determination of Interrater Agreement

 Refinement of the domain specifications is required if the


first case.
 If the latter is the problem, the raters are given more
explicit directions and guidelines in the use of the scale
to reduce the chance of differential use.
 A clear and precise domain definition
 domain specifications function to communicate what the
results of measurements mean to those people who
must interpret them,
 what types of items and content should be included in
the measure to those people who must construct the
items.
CRITERION-REFERENCED
Validity Assessment
Average Congruent/ Percentage
 Content specialists are judge the congruence of each item
on a measure

 The proportion of items rated congruent by each judge is


calculated and converted to a percentage.

 Then the mean percentage for all judges is calculated to


obtain the average congruency percentage.

 E.g. if the percentages of congruent items for the judges


are 95,90,100, and 100%, the average congruency
percentage would be 96.25%.

 percent ≥ 90 safely considered acceptable


CRITERION-REFERENCED
CONSTRUCT VALIDITY
 Evidence of the content validity is not guarantee that the
measure is useful for its intended purpose.

 "we may say that a test's results are accurately


descriptive of the domain of behaviors it is supposed to
measure, it is quite another thing to say that the function
to which you wish to put a descriptively valid test is
appropriate" (Popham, 1978, p. 159).

 the major focus of construct validation is to


 establish support for the measure's ability to accurately
categorize phenomena in accordance with the purpose
for which the measure being used.
CRITERION-REFERENCED
CONSTRUCT VALIDITY
Approaches used to assess the construct validity

 Experimental Methods and the Contrasted Groups


Approach

 Decision Validity
CONSTRUCT VALIDITY
Approaches used to assess the construct validity
Experimental Methods and the Contrasted Groups
Approach

 The basic principles and procedures for


these two approaches are the same for
criterion-referenced measures as for
norm-referenced measures.
CONSTRUCT VALIDITY
Approaches used to assess the construct validity
Decision Validity

 (1) a student may be allowed to progress to the next unit


of instruc­tion if test results indicate that the preceding unit
has been mastered.

 (2) a woman in early labor may be allowed to ambulate if


the nurse assesses, on pelvic examination, that the fetal
head is engaged (as opposed to unengaged) in the pelvis.

 (3) a diabetic patient may be allowed to go home if the


necessary skills for self-care have been mastered
CONSTRUCT VALIDITY
Approaches used to assess the construct validity
Decision Validity

 The measurements obtained from criterion-referenced


measures are often used to make decisions.

 "Criterion-referenced tests have emerged as instruments


that provide data via which mastery decisions can be
made, as opposed to providing the decision itself
(Hashway, 1998, p. 112).

 The decision validity of a measure is supported when the


set standard (s) or criterion classifies subjects or objects
with a high level of confidence.
CONSTRUCT VALIDITY
Approaches used to assess the construct validity
Decision Validity

 In most instances, two criterion groups are used to test


the decision validity of a measure (low and high )
 E.g.
 "by summing the percentage of who exceed the
performance standard and the percentage who did not"
 decision validity can range from 0 to 100%, with high
percentages reflecting high decision validity.
 Criterion groups for testing the decision validity of a
measure also can be created
 E.g.
CONSTRUCT VALIDITY
Approaches used to assess the construct validity
Decision Validity

 Decision validity is influenced by


the quality of the measure
appropriateness of the criterion groups
the characteristics of the subjects
the level of performance or cut-score
required.
CRITERION-REFERENCED
Criterion-Related Validity

 Criterion-related validity studies of


criterion-referenced measures are
conducted in the same manner as for
norm-referenced measures
CRITERION-REFERENCED
ITEM-ANALYSIS PROCEDURES

 content specialists' ratings holds the most


merit for assessing item validities for
determining which items should be
retained or discarded

 empirical item-discrimination indices


should be used primarily to detect aberrant
items in need of revision or correction
Empirical Item-Analysis
Procedures
 Criterion-referenced item-analysis procedures determine
the effectiveness of a specific test item to discriminate
subjects who have acquired the target behavior and
those who have not.
 Two approaches are used for item analysis
procedures
 (1) the criterion-groups technique, which also
may be referred to as the uninstructed-instructed
groups approach
 (2) pretreatment/post-treatment measures
approach, which in appropriate instances may
be called the preinstruction/postinstruction
measurements approach.
Advantage and disadvantage
 The criterion-groups technique is highly practical
 difficulty of defining criteria for identifying
groups. Another is the requirement of
equivalence of groups

 Pretreatment/post-treatment measures approach


allowing analysis of individual as well as group
gains.
 impracticality , the amount of time that may be
required, potential problem with testing effect,
CRITERION-REFERENCED
ITEM-ANALYSIS PROCEDURES

 Three item-analysis procedures


are :
 (1) Item-Objective or Item-Subscale Congruence
 (2) Item Difficulty
 (3) discrimination index
ITEM-ANALYSIS PROCEDURES
Item-Objective or
Item-Subscale Congruence
 provides an index of the validity of an item based on the
ratings of two or more content specialists
 In this method content specialists are directed to assign a
value of+1,0, or -1 for each item

 an item definitely measure the objective or subscale, a


value of +1 is assigned.

 A rating of 0 indicates that the judge is undecided about


the item.

 The assignment of a -1 rating reflects a definite judgment


that the item is not a measure of the objective or sub-
scale.
ITEM-ANALYSIS PROCEDURES
Item-Objective or
Item-Subscale Congruence
 The limits of the index range from -1.00 to +1.00.

 An index of +1.00 will occur when perfect positive item-


objective or subscale congruence exists, that is, when all
content specialists assign a +1 to the item for its related
objective or subscale and a —1 to the item for all other
objectives or subscales that are measured by the tool.

 An index of -1.00 represents the worst possible value of


the index and occurs when all content specialists assign a
-1 to the item for what was expected to be its related
objective or subscale and a +1 to the item for all other
objectives or subscales.
ITEM-ANALYSIS PROCEDURES
Item-Objective or
Item-Subscale Congruence
does not depend on the number of content specialists used
or on the number of objectives measured by the test or
questionnaire.
 the tool must include more than one objective or
subscale in order for this procedure to be used.

 cut­off score derived by the test developer.

 done by creating the poorest set of content specialists'


ratings

 Below cut­off score ; nonvalid; discarded from the


measure or ana­lyzed and revised to Improve their
validity.
 above cut­off score are considered valid.
ITEM-ANALYSIS PROCEDURES
Item Difficulty

 the purpose is to examine the difficulty level of items and


compare them between criterion groups

 The approaches to calculating item p levels and their


interpretation was discussed

 The item p level should be higher for the group that is


known to possess more of a specified trait or attribute
than for the group known to possess less
ITEM-ANALYSIS PROCEDURES
Item Discrimination

 The focus on the measurement of performance changes


(e.g., pretest/posttest) or differences (e.g.,
experienced/inexperienced) between the criterion
groups.

 referred to as D‘

 is directly related to the property of decision validity,


 Items with high positive discrimination indices improve
the decision validity of a test.
ITEM-ANALYSIS PROCEDURES
Item Discrimination

 Criterion groups difference index (CGDI)


 Pre/post treatment measurements
approach indices
ITEM-ANALYSIS PROCEDURES
Item Discrimination

 criterion groups difference index (CGDI) is


the proportion of respondents in the group
known to have less of the trait or attribute
of interest who answered the item
appropriately or correctly subtracted from
the proportion of respondents in the group
known to possess more of the trait or
attribute of interest who answered it
correctly.
ITEM-ANALYSIS PROCEDURES
Item Discrimination

 Pretreatment/post treatment measurements approach

 Three item-discrimination indices are

 (1) pretest/posttest difference.


 (2) individual gain.
 (3) net gain.
ITEM-ANALYSIS PROCEDURES
Item Discrimination
 The pretest/posttest difference index (PPDI) is the
proportion of respondents who answered the item
correctly on the posttest minus the proportion who
responded to the item correctly on the pretest

 The individual gain index (IGI) is the pro­portion of


respondents who answered the item incorrectly on the
pretest and correctly on the posttest

 The net gain index (NGI) is the proportion of respondents


who answered the item incorrectly on both occasions
subtracted from the IGI.
ITEM-ANALYSIS PROCEDURES
Item Discrimination

 NGI provides the most conservative estimate of item


discrimination and uses more information.

 The range of values for each of the indices discussed


above is -1.00 to +1.00

 except for IGI, which has a range of 0 to +1.00.

 A high positive index for each of these item


discrimination indices is desirable.

You might also like