Validity & Reliability (Chapter 4 - Learning Assessment)

CERTIFICATE IN PROFESSIONAL EDUCATION
VALIDITY AND RELIABILITY
JERRY D. CABANGON, Ph. D

Professor
VALIDITY
ISTERM DERIVED FROM THE LATIN
WORD “VALIDUS”, meaning “STRONG”.
THE ABILITY OF AN INSTRUMENT TO
MEASURE WHAT YOU INTEND IT TO

MEASURE OR WHAT IS SUPPOSED TO
MEASURE. (Colton & Covert (2007)
VALIDITY
Is a matter of degree
 Assessment instruments are not merely valid or
invalid
 Validity exists in varying degrees across a
continuum.
 Validity is a characteristics of the responses/data
gathered
VALIDITY
Is a matter of degree
 The greater the evidence of validity the greater the
likelihood of credible trustworthy data.
 Hence, the importance of establishing/testing the
validity before the instrument is used.
In order to gather evidence that an instrument is valid, we need
to establish
: that is measuring
THREE (3) OTHER SOURCES OF INFORMATION THAT CAN BE

USED TO ESTABLISH VALIDITY:
A. CONTENT-RELATED EVIDENCE/Validity (the right content)
 FORVALIDITY PERTAINS TO THE EXTENT TO WHICH THE
TEST COVERS THE ENTIRE DOMAIN OF CONTENT.
-Does the instrument measure the content it’s intended to
measure?
Establishing Evidence of Content Validity
These steps are done during the assessment development stage:

1. Define the content domain that the assessment intends to
measure.
2. Define the components of the content domain that should be
represented in the assessment through a literature review
3. Write the items/questions that reflect this defined content domain
4. Have a panel of topic experts review the items/questions.
Establishing Evidence of Content Validity
You are design an instrument to measure undergraduate college teaching

effectiveness.
1,. Clearly define the domain of the content that the assessment intends to
represents. (Determine the topics/principles related to college teaching effectives
using the literature
2. Define the components of the content domain that should be represented in the
assessment.
(Select the content areas that are specific to effective undergraduate college
teaching (not graduate school or adult learning)
3. Write item/questions that reflect this defined content domain
(Write response items for each component
4. Have a panel of topic experts review the items/questions.
FACE VALIDITY
 A TEST
THAT APPEARS TO ADEQUATELY
MEASURE THE LEARNING OUTCOMES
AND CONTENT IS SAID TO POSSESS
FACE VALIDITY
 ASTHE NAME SUGGESTS, IT LOOKS AT
THE SUPERFICIAL FACE VALUE OF THE
INSTRUMENT
INSTRUCTIONAL VALIDITY
 THEEXTENT TO WHICH AN ASSESSMENT IS

SYSTEMATICALLY SENSITIVE TO THE NATURE OF
INSTRUCTION OFFERED.
 THIS
IS CLOSELY RELATED TO INSTRUCTIONAL
SENSITIVITY WHICH IS DEFINED AS THE “DEGREE
TO WHICH STUDENTS PERFORMANCE ON A TEST
ACCURATELY REFLECT THE QUALITY OF
INSTRUCTION TO PROMOTE STUDENT’S
MASTERY OF WHAT IS BEING ASSESSED”
TABLE OF SPECIFICATIONS
ITIS A TEST BLUEPRINT THAT

IDENTIFIES THE CONTENT AREA
AND DESCRIBES THE LEARNING
OUTCOMES AT EACH LEVEL OF
COGNITIVE DOMAIN.
No. of Recitation Days X Total No. of
Total No. of Recitation Items
Days
¼ x 30 = 7.5 or 8
1/5 x 30 = .20 x 30 = 6
2/5 x 30 - .4 x 30 = 12
In order to gather evidence that an instrument is valid, we need
to establish
: that is measuring
THREE (3) OTHER SOURCES OF INFORMATION THAT CAN BE USED

TO ESTABLISH VALIDITY:
B. CRITERION-RELATED EVIDENCE/Validity (the right criterion)

REFERS TO THE DEGREE TO WHICH TEST SCORES AGREE WITH
AN EXTERNAL CRITERION. AS SUCH, IT IS RELATED TO EXTERNAL
VALIDITY. IT EXAMINES THE RELATIONSHIP BETWEEN AN
ASSESSMENT AND ANOTHER MEASURE OF THE SAME TRAIT.
- Does the instrument scores align with 1 or more standards or outcomes
related to the instrument’s intent?
TWO TYPES OF CRITERION-
RELATED EVIDENCE
CONCURRENT PREDICTIVE
VALIDITY VALIDITY
CONCURRENT VALIDITY
PROVIDES AN ESTIMATE OF A
STUDENT’S CURRENT
PERFORMANCE IN RELATION TO A
PREVIOUSLY VALIDATED OR
ESTABLISHED MEASURE.
CONCURRENT VALIDITY
The assessment scores are valid for indicating current
behavior.
Ex. A group of students take a standardized Math and
Reading comprehension aptitude test in 10th Grade
and receive very low scores. The scores are
compared to grades in 10th grade Algebra and English
literature courses. They are equally low
CONCURRENT VALIDITY
CONCURRENT VALIDITY
Concurrent validity is a type of Criterion Validity. If you create some type of
test, you want to make sure it’s valid: that it measures what it is supposed to
measure. Criterion validity is one way of doing that. Concurrent validity
measures how well a new test compares to an well-established test. It can
also refer to the practice of concurrently testing two groups at the same time,
or asking two different groups of people to take the same test.
Advantages:
It is a fast way to validate your data.

It is a highly appropriate way to validate personal attributes (i.e. depression,
IQ, strengths and weaknesses).
PREDICTIVE VALIDITY
 PERTAINSTO THE POWER OR USEFULNESS OF THE TEST
SCORES TO PREDICT FUTURE PERFORMANCE OR FUTURE
OUTCOMES REGARDING SIMILAR CRITERIA.
 The assessment scores are valid for predicting future outcomes
Example: A group of students take a standardized Math and Verbal

aptitude test in 10th grade and score very low. In the time students’
senior year, 2 years later, the students’ Math and Verbal aptitude scores
(criterion data) on the SAT (a college entrance) bear out to be similarity
low.
PREDICTIVE VALIDITY
C. CONSTRUCT-RELATED
EVIDENCE
IS AN ASSESSMENT OF THE QUALITY
OF THE INSTRUMENT USED.
IT MEASURES THE EXTENT TO WHICH
THE ASSESSMENT IS MEANINGFUL
MEASURE OF AN UNOBSERVABLE
TRAIT OR CHARACTERISTIC.
TWO METHODS OF
ESTABLISHING CONSTRUCT
VALIDITY:
CONVERGENT DIVERGENT
VALIDITY VALIDITY
CONVERGENT VALIDITY
OCCURS WHEN MEASURES OF CONSTRUCT
THAT ARE RELATED/SIMILAR ARE IN FACT
OBSERVED TO BE RELATEDED
CONVERGENT VALIDITY
OBSERVED TO BE RELATED
CONVERGENT VALIDITY
OBSERVED TO BE RELATED
DIVERGENT (or DISCRIMINANT)
VALIDITY
 OCCURS WHEN CONSTRUCT THAT ARE UNRELATED ARE IN
REALITY OBSERVED NOT TO BE.
(Does the instrument show the “right pattern”s of interrelationships with
other instruments.
 Itmeans that the indicators of one construct hang together or
converge, but also diverge or are negatively associated with opposite
constructs.
 Itsays that if two construct A and B are very different, then measures
of A and B should be associated.
DIVERGENT (or DISCRIMINANT)
VALIDITY
 For
example: We have no items that measure political conservatism.
People answer all 10 in similar ways. But we have also put 5
questions in the same questionnaire that mesure political liberalism.
Our measure of conservatism has discriminant validity if the
IN1959, CAMPBELL & FISKE
DEVELOPED A STATISTICAL
APPROACH CALLED “MULTITRAIT-
MULTIMETHOD MATRIX (MTMM)”
 MTMM IS A TABLE OF
CORRELATIONS ARRANGED TO
FACILITATE THE ASSESSMENT
OF CONSTRUCT VALIDITY.
THREATS TO VALIDITY
MILLER, LINN & GRONLUND (2009)

IDENTIFIED TEN FACTORS THAT AFFECT
VALIDITY OF ASSESSMENT RESULT.
THESE FACTORS ARE DEFECTS IN THE
CONSTRUCTION OF ASSESSMENT
TASKS THAT WOULD RENDER
ASSESSMENT INFERENCES
INACCURATE.
1. UNCLEAR TEST DIRECTIONS
2. COMPLICATED VOCABULARY &
SENTENCE STRUCTURE
3. AMBIGUOUS STATEMENTS
4. INADEQUATE TIME LIMITS
5. INAPPROPRIATE LEVEL OF
DIFFICULTY OF TEST ITEMS
6. POORLY CONSTRUCTED TEST
ITEMS
7. INAPPROPRIATE TEST ITEMS FOR
OUTCOMES BEING MEASURED.
8. SHORT TEST.
9. IMPROPER ARRANGEMENT OF
ITEMS.
10. IDENTIFIABLE PATTERN OF
ANSWERS.
Reliability talks about
reproducibility and consistency in
methods and criteria. An assessment
is said to be reliable if it produces the
same results if given to an examinee
on two occasions.
 It is important then to stress that reliability:
 1.It pertains to the obtained assessment results
and not to the test or any other instrument.
 2.It is unlikely to turn out 100% because no two
tests will consistently produce identical results.
 3.There are environmental factors like lighting
and noise, student error and physical well-being
of examinees also affect consistency of
assessment results.
 Reliability is expressed as a correlation
coefficient.
 Karl Pearson formula
OBJECTIVES
1. Discuss Pearson R
2. Solve for the Value of R
3. Determine the strength of relationship
4. Explain the importance of Pearson R
PEARSON r
 It is the most common method to use for numerical variables.
 It is measure of the strength of the linear relationship between two
variables.
 The correlation coefficient
 The correlation coefficient takes on values ranging between -1
and +1.
TYPES OF CORRELATION (Direction Relationship)

1. Positive Correlation – When a variable or X increases, the
variable Y is also increases. (Teaching performance affects the
performance of the student positively)
Ex.:
It means as the performance of teacher is increases, the
performance of the student is also increases.
and +1.

2. Negative Correlation – When a variable or X increases,
variable or Y also decreases. On the other hand, as one variable
decreases the other one also increases.
Ex.:
Travel time and happiness, as your travel time becomes longer
your happiness becomes lower.
and +1. We cannot have an answer beyond 1.

3. No Correlation – It means one variable does not affect to the
other variable.
FORMULA
N = Number of pair scores

∑x = summation of the values of x
∑y = summation of the values of y
∑xy = summation of the product of X and Y
∑x2 = summation of x2
∑y2 = summation of y2
1. Fill in the table
Subject Age (X) Glucose Level XY X2 Y2
(Y)
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 42 75 3150 1764 5625
4 57 87 4959 3249 7569
5 59 81 4779 3481 6561
SUM ∑x = 222 ∑y = 407 ∑=18510 ∑x2 =10784 ∑y2= 33781
2. Substitute the values on the given
formula
SUM ∑x = 222 ∑y = 407 ∑=18510 ∑x2 =10784 ∑y2= 33781
5(18510) - (222)(407)
[5(10784) - (222)2] [5(33781) - (407)2]

3. Solve for the value of r STEPS TO FOLLOW:

1.Multiply
2. Subtract
3. Multiply the denominator
4. Get the square root
5. Last, Divide
4. Determine the strength of the r –
value
r-value Strength of Relationship
r< 0.30 None or Very Weak Relationship
0.30< r < 0.50 Weak Relationship
0.50< r < 0.70 Moderate Relationship
r>0.70 Strong Relationship

Try it on your own
TEACHER PERFORMANCE STUDENT PERFORMANCE

(X) (Y)
4 89
4 96
3 88
4 85
3 90
Try it on your own
SERVICE QUALITY CUSTOMER SATISFACTION

(X) (Y)
5 4
4 5
5 4
5 5
4 5
 Types of Reliability
 1.Internal Consistency Reliability - assesses the
consistency of results across items within a test.
 is a way to gauge how well a test or survey is
measuring what you want it to measure.
 2.External Reliability - gauges the extent to which a
measure varies from one use to another.
 Sources of Reliability Evidence
 1.Evidence based on stability
 2.Evidence based on equivalent forms
 3.Evidence based on internal consistency
 4.Evidence based on scorer or rater consistency
 5.Evidence based on decision consistency
Stability
Test re-test reliability correlates scores
obtained from two administrations of the
same test over the period of time. It is used
to determine stability of test results over
time.
 Equivalence
 Equivalent forms method (also called alternate or
parallel) - In this method, two different versions
of an assessment tool are administered to the
same group of individuals.
 Internal consistency
 Internalconsistency implies that a student who
has mastery learning will get all or most of the
items correctly while student who knows little or
nothing about the subject matter will get all or
most of the items wrong. To get the internal
consistency, the split half method can be done by
dividing the test into two.
 First,
divide the test into half usually using
odd-even technique.
 Second,find the correlation of scores using
Pearson r formula.
 Third, adjust & re-evaluate correlation
using Spearman-Brown formula
 The purpose of re-evaluating the
correlation is to determine the
reliability of the test as a whole. To
determine the reliability of our test as a
whole we use Spearman-Brown
formula.
 Analysis: the result above shows that
0.98 is closer to +1 hence the test is
highly reliable.
 Scorer or rater consistency
 Inter rater reliability is the degree to which
different raters, observers or judges agree in their
assessment decisions. To mitigate rating errors, a
wise selection and training of good judges and
use of applicable techniques are suggested.
 Spearman’s rho or Cohen’s Kappa may be used
to calculate the correlation coefficient between
or among the ratings.
Spear Rank Correlation
 Itis nonparametric version of the Pearson product-moment
correlation
 Spearman’s correlation coefficient measures the strength and
direction of association between two ranked variables.
 The correlation coefficient takes on values ranging between
-1 and + 1.
n=number of pairs
∑D2 – summation of the square
of the difference of two score
1. RANK THE GRADES
STUDENTS RATING X RATING Y RANK X RANK Y
A 73 77 6 7
B 76 78 5 6
C 78 79 4 5
D 65 80 7 4
E 86 86 2 3
F 82 89 3 2
G 91 95 1 1
TOTAL
2. FILL IN THE TABLE
RANK X RANK Y D D2
6 7 -1 1
5 6 -1 1
4 5 -1 1
7 4 3 9
2 3 -1 1
3 2 1 1
1 1 0 0
3. SUBTITUTE THE VALUES
ON THE GIVEN FORMULA
∑d2 =14 n=7
Sr = 1-6(14)
73 - 7
4. SOLVE FOR THE VALUE OF Sr
Sr = 1-6(14) n= 7
73 - 7
Sr = 1- 84
343-7
Sr = 1- 84
336
Sr = 1- 0.25
Sr = 0.75
5. DETERMINE THE STRENGTH OF THE Sr, value
 Spearman’s rho formula
Students Rating X Rating Y Rank X Rank Y d = (X-Y) D2

A 73 77 6 7 -1 1
B 76 78 5 6 -1 1
C 78 79 4 5 -1 1
D 65 80 7 4 3 9
E 86 86 2 3 -1 1
F 82 89 3 2 1 1
G 91 95 1 1 0 0
=SUM 0 =SUM 14
R = 1 – 6(14)
7(7^2 – 1)
= 1 – (84 / 336)
= 1 – 0.25
= .75 = strong correlation
TRY IT ON YOUR OWN
SCIENCE ARALING PANLIPUNAN
70 67
59 65
60 45
75 40
48 80
39 73
 Spearman’s rho formula
Students Rating X Rating Y Rank X Rank Y d = (X-Y) D2

A 73 77 6 7 -1 1
B 76 78 5 6 -1 1
C 78 79 4 5 -1 1
D 65 80 7 4 3 9
E 86 86 2 3 -1 1
F 82 89 3 2 1 1
G 91 95 1 1 0 0
R = 1 – [6(14) / 7(7^2 – 1)
= 1 – (84 / 336)
= 1 – 0.25
= .75 = strong correlation
 Decision consistency describes how consistent
the classification decisions are rather than how
consistent the scores are. Decision consistency
is seen in situation when teacher decide who
will receive a passing or fail mark or considered
to possess mastery or not.
 Measurement Errors
 Measurements errors can be caused by the
following:
 1.Examinee specific factors like fatigue,
boredom, lack motivation, lapses of memory and
carelessness.
 2.Lack of sleep
 3.Students physical condition
 4.Teachers who provided poor or insufficient
directions
 5.Inconsistentgrading systems, carelessness and
computational errors lead to imprecise or
erroneous student evaluations.
 The error component includes random and
systematic error.
 1.Random errors - produce random fluctuations
in measurement scores
 2.Systematic errors - called as systematic bias is
consistent, repeatable error associated with faulty
equipment.
 The standard error of measurement (SEM) is an
index of the expected variation of the observed
scores due to measurement errors. Better
reliability means a lower SEM. SEM pertains to
the standard deviation of measurement errors
associated with test scores.
 SEM is use to calculate confidence intervals
around obtained scores.
 Reliability of Assessment Method
 Between a well constructed objective tests and
performance assessment, the former has better
reliability.
 For oral questioning, suggestions for improvement in
reliability of written tests may also be extended to oral
examinations like increasing the number of questions,
response time and number of examiners and using a
rubric or marking guide that contains the criteria and
standards.
 For direct observation data, can be enhanced
through inter-observer agreement and intra-
observer reliability.
Self-assessment have high consistency if self
assessments are done by students who had been
trained in how to evaluate their work.
Thank you!

Validity & Reliability (Chapter 4 - Learning Assessment)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Validity & Reliability (Chapter 4 - Learning Assessment)

Uploaded by

Copyright:

Available Formats

CERTIFICATE IN PROFESSIONAL EDUCATION

VALIDITY AND RELIABILITY

JERRY D. CABANGON, Ph. D

MEASURE WHAT YOU INTEND IT TO

THREE (3) OTHER SOURCES OF INFORMATION THAT CAN BE

These steps are done during the assessment development stage:

You are design an instrument to measure undergraduate college teaching

 THEEXTENT TO WHICH AN ASSESSMENT IS

ITIS A TEST BLUEPRINT THAT

THREE (3) OTHER SOURCES OF INFORMATION THAT CAN BE USED

B. CRITERION-RELATED EVIDENCE/Validity (the right criterion)

It is a fast way to validate your data.

Example: A group of students take a standardized Math and Verbal

MILLER, LINN & GRONLUND (2009)

TYPES OF CORRELATION (Direction Relationship)

TYPES OF CORRELATION (Direction Relationship)

TYPES OF CORRELATION (Direction Relationship)

N = Number of pair scores

3. Solve for the value of r STEPS TO FOLLOW:

r-value Strength of Relationship

r< 0.30 None or Very Weak Relationship

0.30< r < 0.50 Weak Relationship

0.50< r < 0.70 Moderate Relationship

r>0.70 Strong Relationship

TEACHER PERFORMANCE STUDENT PERFORMANCE

SERVICE QUALITY CUSTOMER SATISFACTION

∑d2 =14 n=7

Students Rating X Rating Y Rank X Rank Y d = (X-Y) D2

Students Rating X Rating Y Rank X Rank Y d = (X-Y) D2

You might also like