Professional Documents
Culture Documents
Validity & Reliability (Chapter 4 - Learning Assessment)
Validity & Reliability (Chapter 4 - Learning Assessment)
¼ x 30 = 7.5 or 8
1/5 x 30 = .20 x 30 = 6
2/5 x 30 - .4 x 30 = 12
In order to gather evidence that an instrument is valid, we need
to establish
: that is measuring
CONCURRENT PREDICTIVE
VALIDITY VALIDITY
CONCURRENT VALIDITY
PROVIDES AN ESTIMATE OF A
STUDENT’S CURRENT
PERFORMANCE IN RELATION TO A
PREVIOUSLY VALIDATED OR
ESTABLISHED MEASURE.
CONCURRENT VALIDITY
The assessment scores are valid for indicating current
behavior.
Ex. A group of students take a standardized Math and
Reading comprehension aptitude test in 10th Grade
and receive very low scores. The scores are
compared to grades in 10th grade Algebra and English
literature courses. They are equally low
CONCURRENT VALIDITY
CONCURRENT VALIDITY
Concurrent validity is a type of Criterion Validity. If you create some type of
test, you want to make sure it’s valid: that it measures what it is supposed to
measure. Criterion validity is one way of doing that. Concurrent validity
measures how well a new test compares to an well-established test. It can
also refer to the practice of concurrently testing two groups at the same time,
or asking two different groups of people to take the same test.
Advantages:
CONVERGENT DIVERGENT
VALIDITY VALIDITY
CONVERGENT VALIDITY
OCCURS WHEN MEASURES OF CONSTRUCT
THAT ARE RELATED/SIMILAR ARE IN FACT
OBSERVED TO BE RELATEDED
CONVERGENT VALIDITY
OCCURS WHEN MEASURES OF CONSTRUCT
THAT ARE RELATED/SIMILAR ARE IN FACT
OBSERVED TO BE RELATED
CONVERGENT VALIDITY
OCCURS WHEN MEASURES OF CONSTRUCT
THAT ARE RELATED/SIMILAR ARE IN FACT
OBSERVED TO BE RELATED
DIVERGENT (or DISCRIMINANT)
VALIDITY
OCCURS WHEN CONSTRUCT THAT ARE UNRELATED ARE IN
REALITY OBSERVED NOT TO BE.
(Does the instrument show the “right pattern”s of interrelationships with
other instruments.
Itmeans that the indicators of one construct hang together or
converge, but also diverge or are negatively associated with opposite
constructs.
Itsays that if two construct A and B are very different, then measures
of A and B should be associated.
DIVERGENT (or DISCRIMINANT)
VALIDITY
For
example: We have no items that measure political conservatism.
People answer all 10 in similar ways. But we have also put 5
questions in the same questionnaire that mesure political liberalism.
Our measure of conservatism has discriminant validity if the
IN1959, CAMPBELL & FISKE
DEVELOPED A STATISTICAL
APPROACH CALLED “MULTITRAIT-
MULTIMETHOD MATRIX (MTMM)”
MTMM IS A TABLE OF
CORRELATIONS ARRANGED TO
FACILITATE THE ASSESSMENT
OF CONSTRUCT VALIDITY.
THREATS TO VALIDITY
5(18510) - (222)(407)
[5(10784) - (222)2] [5(33781) - (407)2]
3 88
4 85
3 90
Try it on your own
5 4
5 5
4 5
Types of Reliability
1.Internal Consistency Reliability - assesses the
consistency of results across items within a test.
is a way to gauge how well a test or survey is
measuring what you want it to measure.
2.External Reliability - gauges the extent to which a
measure varies from one use to another.
Sources of Reliability Evidence
1.Evidence based on stability
2.Evidence based on equivalent forms
3.Evidence based on internal consistency
4.Evidence based on scorer or rater consistency
5.Evidence based on decision consistency
Stability
Test re-test reliability correlates scores
obtained from two administrations of the
same test over the period of time. It is used
to determine stability of test results over
time.
Equivalence
Equivalent forms method (also called alternate or
parallel) - In this method, two different versions
of an assessment tool are administered to the
same group of individuals.
Internal consistency
Internalconsistency implies that a student who
has mastery learning will get all or most of the
items correctly while student who knows little or
nothing about the subject matter will get all or
most of the items wrong. To get the internal
consistency, the split half method can be done by
dividing the test into two.
First,
divide the test into half usually using
odd-even technique.
Second,find the correlation of scores using
Pearson r formula.
Third, adjust & re-evaluate correlation
using Spearman-Brown formula
The purpose of re-evaluating the
correlation is to determine the
reliability of the test as a whole. To
determine the reliability of our test as a
whole we use Spearman-Brown
formula.
Analysis: the result above shows that
0.98 is closer to +1 hence the test is
highly reliable.
Scorer or rater consistency
Inter rater reliability is the degree to which
different raters, observers or judges agree in their
assessment decisions. To mitigate rating errors, a
wise selection and training of good judges and
use of applicable techniques are suggested.
Spearman’s rho or Cohen’s Kappa may be used
to calculate the correlation coefficient between
or among the ratings.
Spear Rank Correlation
Itis nonparametric version of the Pearson product-moment
correlation
Spearman’s correlation coefficient measures the strength and
direction of association between two ranked variables.
The correlation coefficient takes on values ranging between
-1 and + 1.
n=number of pairs
∑D2 – summation of the square
of the difference of two score
1. RANK THE GRADES
STUDENTS RATING X RATING Y RANK X RANK Y
A 73 77 6 7
B 76 78 5 6
C 78 79 4 5
D 65 80 7 4
E 86 86 2 3
F 82 89 3 2
G 91 95 1 1
TOTAL
2. FILL IN THE TABLE
RANK X RANK Y D D2
6 7 -1 1
5 6 -1 1
4 5 -1 1
7 4 3 9
2 3 -1 1
3 2 1 1
1 1 0 0
3. SUBTITUTE THE VALUES
ON THE GIVEN FORMULA
Sr = 1-6(14)
73 - 7
4. SOLVE FOR THE VALUE OF Sr
Sr = 1-6(14) n= 7
73 - 7
Sr = 1- 84
343-7
Sr = 1- 84
336
Sr = 1- 0.25
Sr = 0.75
5. DETERMINE THE STRENGTH OF THE Sr, value
Spearman’s rho formula
R = 1 – 6(14)
7(7^2 – 1)
= 1 – (84 / 336)
= 1 – 0.25
= .75 = strong correlation
TRY IT ON YOUR OWN
SCIENCE ARALING PANLIPUNAN
70 67
59 65
60 45
75 40
48 80
39 73
Spearman’s rho formula
R = 1 – [6(14) / 7(7^2 – 1)
= 1 – (84 / 336)
= 1 – 0.25
= .75 = strong correlation
Decision consistency describes how consistent
the classification decisions are rather than how
consistent the scores are. Decision consistency
is seen in situation when teacher decide who
will receive a passing or fail mark or considered
to possess mastery or not.
Measurement Errors
Measurements errors can be caused by the
following:
1.Examinee specific factors like fatigue,
boredom, lack motivation, lapses of memory and
carelessness.
2.Lack of sleep
3.Students physical condition
4.Teachers who provided poor or insufficient
directions
5.Inconsistentgrading systems, carelessness and
computational errors lead to imprecise or
erroneous student evaluations.
The error component includes random and
systematic error.
1.Random errors - produce random fluctuations
in measurement scores
2.Systematic errors - called as systematic bias is
consistent, repeatable error associated with faulty
equipment.
The standard error of measurement (SEM) is an
index of the expected variation of the observed
scores due to measurement errors. Better
reliability means a lower SEM. SEM pertains to
the standard deviation of measurement errors
associated with test scores.
SEM is use to calculate confidence intervals
around obtained scores.
Reliability of Assessment Method
Between a well constructed objective tests and
performance assessment, the former has better
reliability.
For oral questioning, suggestions for improvement in
reliability of written tests may also be extended to oral
examinations like increasing the number of questions,
response time and number of examiners and using a
rubric or marking guide that contains the criteria and
standards.
For direct observation data, can be enhanced
through inter-observer agreement and intra-
observer reliability.
Self-assessment have high consistency if self
assessments are done by students who had been
trained in how to evaluate their work.
Thank you!