Chapter 5 Reliability

10/26/2022
The Concept of Reliability
Reliability: Consistency in measurement.

Chapter 5
Reliability coefficient is an index of
reliability, a proportion that indicates the ratio
Reliability between the true score variance on a test and
the total variance.
McGraw-Hill/Irwin © 2013 McGraw-Hill Companies. All Rights Reserved. 5-2
The Concept of Reliability Reliability Estimates
Observed score = True Score + Error Variance = Standard deviation squared

Variance= True Variance + Error Variance
Error refers to the component of the observed
score that does not have to do with the test takers
true ability or trait being measured.
5-3 5-4
Reliability Estimates Reliability Estimates
• An instrument is said to be reliable if it accurately Measurement Error- Other than the variable being
reflects the true score, and thus minimizes the measured, all of the factors associated with the
error component. process of measuring some variable.
– If you see a reliability coefficient of .85, this means • A test written in English will turn out to be measurement
that 85% of the variability in observed scores is error if there are students from different countries
presumed to represent true individual differences and
15% of the variability is due to measurement error.
5-5 5-6
1
10/26/2022
The Concept of Reliability

Sources of Error Variance
Measurement Error
• Test Construction
• Test Administration
Random Error- Systematic Error-
a source of error in a source of error in measuring
• Test Scoring and Interpretation
measuring a targeted variable a variable that is typically • Other Sources
caused by unpredictable constant or proportionate to
inconsistencies of other what is presumed to be the
variables in the measurement true value of the variable being
process (e.g. noise) measured. (e.g. a weight scale
measure half a kilogram less)
5-7 5-8
Sources of Error Variance Sources of Error Variance

• Test Administration
• Test Construction
– Testing Environment
Variation may exist within items on a test or • e.g., temperature, level of lighting, amout of ventilation and noise.
between tests (e.g., item sampling or content
– Testtaker -Related Variables
sampling). • e.g., pressing emotional problems, physical discomfort, lack of
sleep, and the effects of drugs or medication.
– Examiner-Related Variables
• e.g., physical appearance and behaviors may play a role.
5-9 5-10

• Other Sources of Error Variance
• Test Scoring and Interpretation
• Sampling error - the extent to which the sample
– Subjectivity in scoring can enter into behavioral
of the study actually was representative of
assessment (e.g. projective tests).
population of the study.
– Computer testing reduces error in test scoring but

many tests still require expert interpretation
5-11 5-12
2
10/26/2022

Reliability Estimates
Other Sources of Error Variance
• Reliability is a correlation computed
•Methodological Error between two events
• interviewers may not have been trained properly
• Repeated use of the instrument (stability)
• the wording in the questionnaire may have been
• Similarity of items (split half / internal consistency)
ambiguous
• the items may have somehow been biased (e.g., • Equivalence of two instruments (equivalence)
double‐barrelled questions, leading questions)
5-13 5-14

Test-Retest Reliability- Same group of respondents Test-Retest Reliability
complete the instrument at two different points in –Most appropriate for variables that should be
times. How stable are the responses? stable over time (e.g. personality) and not
– Whether you measure the 12cm today, appropriate for variables expected to change
tomorrow or next year the ruler still measure over time (e.g. mood).
12cm as 12cm. –Estimates tend to decrease as time passes.
5-15 5-16
Parallel-Forms or Alternate-Forms
Estimate of test-retest reliability might be low
– In a math test if tutorial is taken before the second
administration.
• Involves using differently worded questions to
– In a personality test if the test-taker suffered from
measure the same construct.
emotional trauma or received counseling before the – Questions or items are reworded to produce two items
second administration. that are similar but not identical.
– During the times of great developmental change – Items must focus on the same exact aspect of behavior
even short time interval between testings might result in with the same vocabulary level and same level of
low test-retest realibility. difficulty.
5-17 5-18
3
10/26/2022

Parallel-Forms or Alternate-Forms Parallel-Forms or Alternate-Forms
• Reliability is the correlation between the responses to the • Test A: All college students graduate sooner or later.
pairs of questions.
• Alternate forms reliability is said to avoid the practice • Test B: Everyone who works for a good company gets a
effects that can inflate test-retest reliability (i.e., promotion sooner or later.
respondent can recall how they answered on the identical
item on the first test administration).
5-19 5-20
Reliability Estimates Internal Consistency: Homogeneity
Parallel-Forms and Alternate-Forms are similar • It is a measure of how well related the items.
in two ways:
– Are different items all measure the same thing?
•Reliability is checked by administering two forms
of a test to the same group. • It is applied to groups of items thought to measure
different aspects of the same concept.
•Scores may be affected by error related to the state
of testtakers (e.g. practice, fatigue, etc.) or item
sampling.
5-21 5-22
Internal Consistency Example Internal Consistency: Rand Example

• The following questions are about activities you might do
• The Rand 36-item Health Survey measures 8 dimensions during a typical day. Does your health now limit you in
of health. One of these dimensions is “physical function.” these activities. If so, how much?(Response options are:
limited a lot, limited a little, not limited at all).
– Vigorous activities, such as running, lifting heavy objects, participating in strenuous sports.
• Instead of asking just one question, “How limited are you – Moderate activities, such as moving a table, pushing a vacuum cleaner, bowling, or playing
in your day-to-day activities?” Rand found that asking 10 golf.
– Lifting or carrying groceries.
questions produced more reliable results, and conveyed a – Climbing several flights of stairs.
better understanding of “physical function.” – Climbing one flight of stairs.
– Bending, kneeling, or stooping.
– Walking more than a mile.
– Walking several blocks.
– Walking one block.
– Bathing or dressing yourself.
5-23 5-24
4
10/26/2022
Internal Consistency: Homogeneity

Split-half reliability- is obtained by correlating two
Ways of splitting a test
pairs of scores obtained from equivalent halves of a
single test administered once. Entails three steps: – Dividing in the middle – low reliability
1. Divide the test into equivalent halves. – Assigning odd numbered items to one half and
even numbered items to the other half of the test
2. Calculate a Pearson r between scores on the
(odd-even reliability)
two halves of the test.
– Randomly assign items to one or the other half of
3. Adjust the half-test reliability using the
the test.
Spearman-Brown formula.
5-25 5-26
Other Methods of Estimating Internal Consistency
The Spearman-Brown Formula
–Usually, but not always, reliability increases as test Kuder-Richardson formula 20- the inter-item
length increases. consistency of dichotomous items
–It is a specific application of a more general

formula to estimate the reliability of a test that is
lengthened or shortened by any number of items.
5-27 5-28
Other Methods of Estimating Internal Consistency
• Measures of reliability are estimates and
Cronbach’s Alpha is mean of all possible split-half estimates are subject to error
correlations. • The reliability coefficient varies with the sample
: developed by Cronbach (1951). of testtakers.
– is a function of the number of items in the scale and the
degree of their intercorrelations.
– corrected by the Spearman-Brown formula.
– the most popular approach for internal consistency.
– Values range from 0 to 1.
– High when more than 25 items
5-29 5-30
5
10/26/2022

Inter-Scorer Reliability Inter-Scorer Reliability
• The degree of agreement or consistency between
two or more raters with regard to a particular
• It is often used with behavioral measures.
measure.
• Guards against biases in scoring.
• Coefficient of inter-score reliability – The scores
from different raters are correlated with one
another.
5-31 5-32

The nature of the test will often determine the –the characteristic, ability, or trait being measured
reliability metric. is presumed to be dynamic or static
–the test items are homogeneous or heterogeneous in
nature • For dynamic characteristics it is most appropriate to
•If the test designed to measure one factor, its is use a measure of internal consistency.
expected to be homogeneous in items. Appropriate to
use internal consistency estimates. • For static characteristics it is most appropriate to
•For heterogeneous tests it might bemore appropriate to test-retest and alternate forms reliability.
use test –retest reliability
5-33 5-34

–the range of test scores is or is not restricted –the test is a speed or a power test
• If the range is restricted by sampling procedure used • For speed tests we can use test-retest reliability,
then the variance of the variable will be low which will alternate forms reliability and two separately timed
lead to smaller correlation coefficient. split-half test (two independent testing periods)
•Because item difficulty is low in most of the speed

tests, single time internal consistency measures will be
spuriously high.
5-35 5-36
6
10/26/2022
The Standard Error of The Standard Error of

Measurement (SEM) Measurement
• Provides a measure of the precision of an observed –Standard error can be used to estimate the extent to
test score. which an observed score deviates from a true score.
•An estimate of the amount of error inherent in an –The higher the reliability of the test, the lower the
observed score or measurement. standard error.
–Confidence interval- a range of test scores that is

likely to contain the true score.
5-37 5-38
The Standard Error of

Measurement
• Standard Error of Mesurement = SD 1  reliabilit y
– Confidence Interval (Band) is established by multiplying

1.96 times the standard error of measurement.
Confidence Interval= SEM x 1.96
• 95 % of the area under normal curve lies within roughly
2 standard deviations of the mean which is associated
with the z score of 1.96.
5-39 5-40
The Standard Error of

Measurement
A test of attention span has a reliability coefficent of
.84. The average score on the test is 10, with a
standard deviation of 5. Lawrence received a score of
64 on the test.
SEM= SD 5 1 1-reliabilit
.84 y
–SEM=2
–Confidence Interval= SEM x 1.96= 2 x 1.96=3.92
•We can be 95% sure that Lawrence's "true" attention span
score falls between 60 and 68.
5-41

Chapter 5 Reliability

Uploaded by

Copyright:

Available Formats

You might also like

Chapter 5 Reliability

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 5 Reliability

Uploaded by

Copyright:

Available Formats

10/26/2022

The Concept of Reliability

Reliability: Consistency in measurement.

McGraw-Hill/Irwin © 2013 McGraw-Hill Companies. All Rights Reserved. 5-2

The Concept of Reliability Reliability Estimates

Observed score = True Score + Error Variance = Standard deviation squared

Reliability Estimates Reliability Estimates

The Concept of Reliability

Sources of Error Variance Sources of Error Variance

Sources of Error Variance

– Computer testing reduces error in test scoring but

Sources of Error Variance

Reliability Estimates Reliability Estimates

Reliability Estimates Reliability Estimates

Reliability Estimates Internal Consistency: Homogeneity

Internal Consistency Example Internal Consistency: Rand Example

Internal Consistency: Homogeneity

–It is a specific application of a more general

Reliability Estimates Reliability Estimates

Reliability Estimates Reliability Estimates

Reliability Estimates Reliability Estimates

•Because item difficulty is low in most of the speed

The Standard Error of The Standard Error of

–Confidence interval- a range of test scores that is

The Standard Error of

– Confidence Interval (Band) is established by multiplying

The Standard Error of

You might also like