Professional Documents
Culture Documents
Measurement Presentation (Issues in Survey Research)
Measurement Presentation (Issues in Survey Research)
Definition of Measurement
Assigning numbers to objects or entities
according to some logical or systematic rule.
Measurement processes are fundamental to all
sciences.
Examples:
Self Report: Depression, personality
characteristics, intelligence.
Physiological indices: Heart rate, blood
pressure, cortisol levels, IL-6,A1C levels
Observational assessments: Parent-Child &
Couple warmth and hostility
Qualitative Research
Some would argue that qualitative
research based on observations of the
researcher (e.g., participant observation)
does not involve measurement.
Note that implicit measurement processes
are occurring, based on how the
investigator characterizes the entity being
assessed (e.g., categorical judgments).
Issues of reliability (e.g., repeatability
across observers or coders) and validity
(e.g., bias) of those characterizations of
the entity arise.
Psychometric Theory
Assumption is that we are attempting to assess a
concept or construct that is not directly
observable, but that we can only indirectly assess
via measurement procedures (i.e., latent variable).
Example: Each of you is asked to rate the
extraversion of a job candidate after watching the
videotape of the individual interacting with others.
Assume that each of you would not come up with the
same score.
Should derive a normally distributed set of scores
around the average value.
Issue: What would be the most accurate estimate of his
or her extraversion?
Distribution of Scores
100
80
60
40
20
0
17.60
17.70
17.80
17.90
18.00
Extraversion
18.10
18.20
18.30
Classic Measurement
Equation
x=t+e
X = measured variable
t = true score
e = random error
Error 2
Error 1
Error3
1
Item 2
Item 1
Item 3
1
Loneliness
Adequacy of Measures
Reliability: Reflects the accuracy of a
measure
Represented by the variance in a
measure due to t, or the underlying
construct that is being measured
Formulas reflect the variance due to t
relative to the total variance in scores
on a measure
Validity: Does the measure assess the
construct that it was designed to measure.
Issue: What is t? Is it the construct
you had in mind?
Reliability
Refers to the repeatability or consistency
of a measure.
Issue first arose in astronomical measurement,
where observers were found to differ from one
another in their measurements of stars.
Systematic Error
Reliability Example
Measure has a reliability of .80.
Implication: 80% of the variation in
scores on that measure are said to
reflect true differences between the
entities that are being assessed (e.g.,
individual differences in loneliness)
Remaining 20% of variance
represents random measurement
error.
Test-Retest Reliability
Internal Consistency:
This is a form of reliability that can be
evaluated when your measure employs
multiple items in assessing the construct.
Involves the consistency of the person's
responses across items.
Domain Sampling model: Assumption is
that the items you have developed for a
measure represent a random sample of
the content domain of the construct.
Example: Social support; select items that
reflect different types of support, such as
emotional support or tangible assistance.
Variation in responding to the items therefore
reflects errors in measurement.
Standardized Coefficient
Alpha ()
= (k*r)/ 1+[(k-1)*r]
k = # of items
r = Average correlation
Derivations
Can increase reliability of a measure by:
Increasing the number of items, assuming that
the same level of correlation (or covariation)
among responses to the individual items is
maintained.
Increasing the correlation or covariation among
the items
Response format:
1. Hardly ever
2. Some of the time
3. Often
Inter-Rater Agreement
Data are sometimes collected by raters or
coders, who evaluate the objects that are
being assessed and assign a number or
numbers.
Examples: Assessments of clinical depression,
coding of behavior during interactions.
Design issue: Rater drift.
Creation of Random
Samples
SPSS syntax:
COMPUTE coder3 = RV.BERNOULLI(.2) .
EXECUTE .
Example: Sample Agreement 1.sav
Kappa Coefficient
Example: kappa.sav
K = po - pc / 1- pc
po = observed % agreement
pc = chance % agreement
Example
% Agreement (PO): 8/10 = .80
% Chance Agreement (PC):
PC = (.7*.7) + (.3*.3) = .58
Kappa () = (PO- PC)/(1- PC)
= (.80-.58)/(1-.58) = .22/.42
= .524
Continuous Measures
The Intra-class correlation is
computed when you have data from
observers on continuous measures
Example: ISBR coding of Warmth &
Hostility in videotaped family
interactions
Ratings are made on several scales,
which are summed together
Issue: Raters may not agree although
their scores may be correlated
Target
Rater 1
Rater 2
10
Target
Rater 1
Rater 2
10
Consistency Definition
ANOVA table
Between People: Differences between
the individuals being evaluated
MSB = Mean Square Between
MSE: Residual variance; consists of:
Between Items (Judges or Raters)
Residual
ICC = (4.683 - .683) / (4.683+.683) = .745
Equivalent to the inter-item correlation
Note that you are ignoring the variance due to
differences between the judges or raters
Absolute Agreement
Definition
This design would be used when you have
a single expert coder who checks a % of
the interactions that have been coded.
Note that this expert is always the same
individual for all coders.
ICC =
(4.683-.683)/4.683+.683+[2/6(80.083-.68
3)
ICC = .126.
Dis-Attenuated Correlation
r = r / (1*2)
r = dis-attenuated correlation
r = observed correlation
= reliability of two measures
Standard Error of an
Individual Score
SEx= x*(1-x)
= standard deviation
= reliability of measure
Rely=1-{[(2i)-({i*2i})]}
2y
2i=variance of measures
i=reliability of measures
2y=variance of total score
Computation of Reliability
Total Variance Reliable Variance of
Composite Variables
Variance = 27.13
Reliable Variance = 19.04
Error Variance = 8.09
Reliability of Difference
Score
Variance
Reliable
Variance
Time 1
.72
25.15
18.11
7.04
Time 2
.76
28.83
21.91
6.92
Measure
Difference
Score
27.75
Sum Score
79.53
Error
Variance
Computation
Time 1 & Time 2 Measures Error
Variance:
Error = 7.04 + 6.92 = 13.96
% Error Variance:
Difference Score = 13.96/27.75 = .50
Sum Score = 13.96/79.53 = .18
Reliability:
Difference Score = 1 - .50 = .50
Sum Score = 1 - .18 = .82
Validity
Definition: Does the measure assess
the construct it was designed to
assess.
Issue: What is t?
What other construct or constructs
affect scores on the measure?
Social Desirability
Desire on the part of some
individuals to appear in a positive
light.
Most constructs we want to assess have
a positive and a negative endpoint
Some individuals may fake good on
the measure.
Marlowe-Crowne SD Scale
Measures individual differences in the
tendency to appear in a positive light.
Example items:
"I am always ready to admit it when I
make a mistake
"I always try to practice what I preach."
Impact on measures:
Examine correlation between scores on
SD measure and other measures
Example: Taylor Manifest Anxiety Scale
Father
Father
Hostility Warmth
Mother
Hostility
Mother
Warmth
Observed
Behavior
.30*
.25*
.31*
.26*
Spouse
Negative
Affectivity
.26*
-.25*
.22*
-.10*
R2
.16*
.14*
.15*
.09*
Halo Effects
Bias due to overall positive or
negative feelings about the
individual; may distort ratings of
performance or other characteristics.
BARS measures: Behaviorally
Anchored Rating Scales; simply
identify which behaviors occur.
Appears to overcome halo effects.
Data:
..\..\..\Loneliness\Iowa Family Survey 2005
\Iowa Family 2005 Survey Data April 07.sa
Types of Validity
Content Validity
Does the measure adequately represent
the meaning of the construct?
Criterion Validity
Do scores on the measure predict
criteria that reflect the construct?
Construct Validity
Are the results based on the measure
consistent w/ theoretical predictions?
Content Validity I
Issue: Does the measure adequately represent
the content domain of the construct that is being
assessed?
Examples:
Content Validity II
Validation: Based on expert judgment regarding
content of a measure or test.
Power vs. Additive approach: Narrow vs. broad
conceptualization of the construct; has
implications for item content, test (factor)
structure.
Disguised tests: Content validity is irrelevant.
Examples: Rorschach, TAT, MMPI (Item: "I attend church
regularly"; an indicator of schizophrenia).
Psychoanalytic conceptualization: Impact of defense
mechanisms on responding to structured tests;
"objective" vs. projective assessments. Example: TAT
assessment.
Criterion Validity
UCLA Score
Loneliness
Loneliness
Rating
Definition:
Demonstrating that
scores on your
measure are
associated with
other methods of
assessing the same
construct.
Loneliness: Scores
on UCLA scale
correlated .71 with
scores on a measure
based on ratings on 7
loneliness scales
Paranormal experiences:
Loneliness:
Found to predict subsequent nursing
home admission, mortality among the
elderly, post-partum depression.
Discriminant Validation
Issue: Demonstrating that your measure
assesses a construct that is different or distinct
from measures of related constructs.
Loneliness scale: Addressed this issue by
conducting a regression analysis wherein we used
measures of other constructs to predict scores on
the loneliness scale.
Scores correlated .71 w/ Loneliness Index.
Loneliness scores were strongly related to measures of
depression (.51), extraversion (-.46), self-esteem (-.49). In
combination, these other variables explained 43% of the
variance in loneliness scores.
After controlling for these other variables, scores on a measure
termed the Loneliness Index accounted for an additional 18%
of the variance in loneliness scores.
Scores on the loneliness scale remained related to time alone
(partial r = .27), number of times eat dinner alone on a Friday
or Saturday night (partial r = .31) and number of friends
(partial r = -.27).
Job Involvement
Job Satisfaction
1.00
Job Involvement
.59
1.00
Organizational
Commitment
.55
.55
Organizational
Commitment
1.00
Multimethod-Multitrait
Analysis
Example Results
LSR
LRR
LSR
1.00
LRR
(.70)
1.00
LBM
(.70)
(.70)
SSR
[.50]
SRR
{.20}
SBM
{.20}
LBM
SSR
SRR
SBM
1.00
1.00
[.50]
[.50]
(.70)
1.00
(.70)
(.70)
1.00
Theory Verification
Issue: Are results based on your measure
consistent with theoretical models involving the
construct? Considered the highest form of
validity, wherein you demonstrate that scores on
your measure relate to measures of other
constructs as you would expect, given theoretical
models.
Problem: What if your empirical evidence is
negative? Is the problem with the theory or the
measure? Have to rely on well-developed and
accepted theory.
Contrast with criterion validity: Are relating the
measure to measures of other constructs.
Results
Emotional arousal: Interaction between
treatment condition & BPS scores
ESP Proven: r = -.31
ESP Disproven: r = .37
Recall:
Gave them a surprise recall test after they
completed the emotional arousal measure
ESP Disproven: r = -.38
ESP Proven: r = .07
% Correct Recall
Conclusions
Validity of a measure is never "proven".
Development of a body of literature supporting the
measure's validity.
Continuing evolution and improvement of measures.
(Loneliness example, revisions over the years).
Makes you very popular. Recently put information on my
Web site; has increased the requests for the measure.
Issue: Putting scale on the Internet; problem with
copyright.