Professional Documents
Culture Documents
Group 7 Handouts
Group 7 Handouts
Group 7 Handouts
MEASUREMENT
Theories of Measurement
Psychometrics is the branch of psychology concerned with the theory and methods of
psychological measurements. Health measurement has been strongly influenced by
psychometrics, although differences in aims and conceptualizations have begun to
emerge.
Classical test theory (CTT) - is a psychometric theory of measurements that has been
dominant until fairly recently. CTT has been used as basis for developing multi-item
measures of health constructs and is also appropriate for conceptualizing all types of
measurements.
Item response theory or IRT- is an appropriate framework only for multi-item scales and
tests.
Errors of Measurements
Procedure for obtaining measurements, as well as the objects being measured, are
susceptible to influences that can alter the resulting data. Some influences can be
controlled or minimized, and attempts should be made to do so, but such efforts are rarely
completely successful.
or
Xo = Xt + Xe
A Measurement Taxonomy
The field of health measurements was in some turmoil for many years with regard to
measurements and definitions.
Recently, a working group in the Netherlands used a Delphi-type approach with a panel
of health measurement experts to identify key measurements properties and to develop
a taxonomy and definitions of those properties.
The result was the creation of COSMIN the Consensus-based Standards for the selection
of health Measurement Instruments.
RELIABILITY
The reliability of a quantitative measure is a major criterion for assessing its quality.
Reliability is the extent to which scores for people who have not changed are the same
for repeated measurements, under several situations, including repetition on different
occasions, by different persons, or on different versions of measure, or in the form of
different items on a multi-item instrument.
The first component within the broad reliability domain is simply called reliability. It covers
four different approaches to reliability assessment, including the following:
1. Test-retest reliability
2. Interrater reliability
4. Intrarater reliability
5. Parallel test reliability
Test-Retest Reliability
Takes the form of administering a measure to the same people on two occasion.
This type of reliability is sometimes called stability or reproducibility – the extent to which
scores can be reproduced on repeated administration.
Interrater or Intrarater Reliability
Reliability assessment involves comparing the observers scores to see if the scores are
comparable.
An assessment in which the same rate make the measurements on two or more
occasions, blinded to the ratings assigned previously.
Number of agreements
____________________________________
Number of agreements + disagreements
Vo = Vt + Ve
or
Vt
R = ________________
Vo
Vt = true variability
VALIDITY
A second domain in the taxonomy of measurement properties.
Validity in a measurement context is defined as the degree to which an instrument is
measuring the construct it purports to measure.
CRITERION VALIDITY
Is the extent to which the scores on an instrument are a good reflection of a “gold
standard” that is a criterion considered an ideal measure of the construct.
FIVE CATEGORIES
Convergent Validity
Is the degree to which scores on the focal measure are correlated with scores on
measures of construct with which there is a hypothesized relationship that is the degree
to which there is conceptual convergence.
Known-Groups Validity
Which has also been called discriminative validity, relies on hypotheses concerning a
measure’s ability to discriminate between two or more groups known to differ with
regard to the construct of interest.
Divergent Validity
Which is often called discriminant validity. Concerns evidence that a measure is not a
measure of a different construct, distinct from the focal construct.
Structural Validity
Refers to the extent to which the structure of a multi-item scale adequately reflects the
hypothesized dimensionality of the construct being measured.
Factor Analysis
Is a method for identifying cluster or related items that is dimensions underlying a broad
construct.
Cross-Cultural Validity
As the degree to which the components of a translated or culturally adapted measure
perform adequately and equivalently, individually, and collectively, relative to their
performance on the original instrument.
Measuring Change
In clinical trials, statisticians have argued against using change scores as the
dependent variables in the analysis of treatment effects.
Change score represent the amount of change between two score.
RESPONSIVENESS
The ability of a measure to detect change over time in a construct that has changed,
commensurate with the amount of change that has occurred.
The Criterion Approach to Responsiveness
This approach to responsiveness assessment has also been called an anchor-based
approach, with the criterion serving as the anchor.
GENERATION
You will not be able to quantify an attribute adequately unless you thoroughly
understand the latent trait (the underlying construct) you wish to capture.
Traditional summated rating scales are based in classical test theory. In CTT, items are
presumed to be roughly comparable indicators of the underlying construct.
Latent trait scales using IRT models can use items like the ones used in CTT, such as
items in a Likert-type format in fact, a person completing a scale would likely not know
whether it had been developed within the CT or IRT framework.
An early step in scale construction is to develop a pool of possible items for the scale.
This is often easier to do as a team effort because different people articulate a similar
idea in diverse ways.
2. The literature - Ideas for item content often come from a thorough
understanding of prior research.
Number of Items
Response Options
Scale items involve both a stem (often a declarative statement) and response options.
Item Intensity
In a traditional summated rating scale, the intensity of the statements (stems) should be
similar and fairly strongly worded. If items are worded such that almost anyone would
agree with them, the scale will not be able to discriminate between people with different
amounts of the underlying trait.
A time frame should not emerge as a consequence of item development. You should
decide in advance, based on your conceptual understanding of the construct and the
needs for which the scale is being constructed, how to deal with time.
Items should be worded in such a manner that every respondent is answering the same
question.
Once a large item pool has been generated, it is time for critical appraisal. Care should
be devoted to such issues as whether individual items capture the construct and are
grammatical and well-worded. The initial review should also consider whether the items
taken together adequately embrace the full nuances of the construct.
In the next step, the initial pool of items is pretested. In a conventional pretest of a new
instrument, a small sample of people (20 to 40 or so) representing the target population
is invited to complete the items.
The panel of experts should include people with strong credentials with regard to the
construct being measured. Experts also should be knowledgeable about the target
population. In the first review, it is also desirable to include experts on scale
construction.
Preliminary Expert Review: Content Validation of Items
The experts' job is to evaluate individual items and the overall scale (and any
subscales), using guidelines established by the scale developer.
In the second round of content validation, a smaller group of experts (three to five) can
be used to evaluate the relevance of the revised set of items and to compute the scale
content validity (S-CVI).
• Representative Sample includes older and younger respondents. Men and women
people with varying educational and ethnic backgrounds.
• The instrument should include the scale items and basic demographics information.
• All data collection efforts, care should be take to make the instrument Attractive,
Professional - looking, and easy to understand.
> The analysis of the data from multi- item scale is a topic about which entire books
have be written.
Basic Item Analysis
• Basic descriptive information for each item should also be examined should have
good variability without it, they will not correlate with the total scale and will not fare well
in a reliability analysis.
• This section deals with a type of factor analysis known as exploratory factor analysis
(EFA).
Factor Extraction
- Condenses items into a smaller number of factors and is used to identify the number
of underlying dimensions.
Factor Rotation
>Orthogonal Rotation
>Oblique Rotation
>Factor Loading
• Although test - retest reliability analysis has not been a standard feature of
psychometric assessment in nursing research, we urge developer of new scales to
gather information about both internal consistency and test - retest reliability.
• The analysis undertaker in the development study often suggest the need to revise
or add items.
• Before deciding that you scale is finalized, it is a good idea to examine the content
of the items in the scales.
• Some scale developers create a total score that is the " average " across items so
that the total score is on the save scale as the item.