Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

ATENEO TEACHER TRAINING CENTER

Naga City

Handout 3

Learning Outcome

At the end of the lesson, you should be able to:

1. Obtain a deep understanding of procedures and approaches that enhance high quality assessment.

Learning Output

1. Interview a teacher depending on the interest of your teaching, and ask him/her to generate some
qualities that s/he believes contribute to good assessment.

Characteristics of High Quality Assessments

High quality assessments provide results that demonstrate and improve targeted student learning.
They also inform instructional decision making.

Is the test I am using a good one? Have I used the right test? These are just two of the questions
some teachers have in mind while the process of assessment unfolds. To ensure the quality of any tests,
the following criteria must be considered:

Clear and appropriate learning targets. In designing good assessment I must begin asking if the
learning targets are at the right level of difficulty to motivate students and if there is adequate balance with
different types of learning targets.

Learning target is simply a clear description of what students know and able to do. Learning
targets are categorized by Stiggins and Conklin (1992) into five. These are:

1. Knowledge learning target is the ability of the student to master a substantive subject matter.
2. Reasoning learning target is the ability to use knowledge and solve problems.
3. Skill learning target is the ability to demonstrate achievement-related skills like conducting
experiments, playing basketball, and operating computers.
4. Product learning target is the ability to create achievement-related products such as written
reports, oral presentations and art products.
5. Affective learning target is the attainment of affective traits such as attitudes, values,
interests, and self-efficacy.

Appropriateness of assessment methods. Once I have identified the learning targets, I have to
match them with the appropriate methods. I must consider the strengths of different methods in measuring
different targets.

Matching Learning Targets with Assessment Methods

Assessment Methods
Targets Objective Essay Performance- Oral Observation Self-
based Question Report
Knowledge 5 4 3 4 3 2
Reasoning 2 5 4 4 2 2
Skills 1 3 5 2 5 3
Products 1 1 5 2 4 4
Affect 1 2 4 4 4 5
Validity. This refers to the extent to which the test serves its purpose or the efficiency with which it
intends to measure. It is what he test intends to measure. Validity is a characteristic that refers to the
appropriateness of the inferences, uses, and consequences that the result of the test or other method of
gathering data.

There are factors that influence the validity of the test, namely: (1) appropriateness of test items,
(2) directions (3) reading vocabulary and sentence structures (4) pattern of answers and (5) arrangement
of items.

How is validity determined?

Validity is always determined by professional judgment. However, there are different types of
evidences to use in determining validity. The following major sources of information can be used to establish
validity.

Content-related validity determines the extent which the assessment is representative of the
domain of interest. Once the complete the domain of content is specified, I must review the test items to be
assured that there is a match between the intended inferences and what is on the test. A test blueprint or a
table of specification will help me further delineate what targets I intend to assess and what is important
from the content domain.

Curricular validity refers to how well test items reflect the actual curriculum (i.e. a test is supposed
to measure of what is in the curriculum). It usually refers to specific, well-defined curriculum like those
provided by the government (CHED and DepEd) to schools. Curricular validity is measured by a panel of
curriculum experts. It is not measured statistically, but rather by a rating of valid or not valid.

Teachers follow a curriculum, students learn what is in the curriculum through their teachers.
However, it does not always follow that a child will be taught what is in the curriculum. Many things can have
an impact on what parts of the curriculum are taught (or are not taught), including inexperienced teachers,
substitute teachers, poorly managed schools/flow of information, teachers may choose not to teach specific
parts of the curriculum or they may skip over parts of the curriculum they do not full understand.

Criterion-related validity determines the relationship between an assessment and another measure
of the same trait. It provides such validity by relating an assessment to some valued measure (criterion)
that either provides an estimate of current performance (concurrent criterion related evidence) or predicts
future performance (predictive criterion-related evidence).

Construct-related validity determines to which the assessment is meaningful measure of an


unobservable trait or characteristic like intelligence, reading comprehension, honesty, motivation, attitude,
learning style and anxiety.

Face validity is determined on the basis of the appearance of an assessment. It is whether, based
on a superficial examination of a test, there seems to be a reasonable measure of the objectives of domain.
Does the test, on the face of it, look like an adequate measure?

Instructional-related validity determines to what extent the domain of content in the test is taught in
class. It refers to how well the test items reflect what is actually taught. It is an actual measure of whether
the schools are providing the students with instruction in the knowledge and skills measured in a test.

Fit validity determines the appropriateness of assessment to the abilities of the examinees.

Test Validity Enhancers

The following are suggestions for enhancing the validity of classroom assessments.
1. Prepare a table of specifications (TOS).
2. Construct appropriate test items.
3. Formulate directions that are brief clear and concise.
4. Consider the reading vocabulary of the examinees. A test is not a word jargon test.
5. Make the sentence structure of your test item simple.
6. Never have an identifiable pattern of answers.
7. Arrange the test items from easy to difficult.
8. Provide adequate time for student to complete the assessment.
9. Use different methods to assess the same thing.
10.Use only for intended purposes.

Reliability. This refers to the consistency with which a student may be expected to perform on a
given test. It means the extent to which a test is dependable, self- consistent and stable. There are factors
that affect test reliability. These are scorer’s inconsistency because of his subjectivity, limited sampling
because of incidental inclusion and accidental exclusion of some materials in the test, changes in the
individual examinee himself and his instability during the examination and the testing environs.

There are various ways of establishing test reliability. These are length of the test, difficulty of the
test and objectivity of the scorer. There are four methods in estimating the reliability of a good measuring
instrument. These methods are as follows: (1) test-retest method, (2) parallel form method, (3) split-half
method, and (4) internal consistency method.

Test-retest method or Test of Stability. The same measuring instrument is administered to the same
group of subjects. The scores of the first and second administrations of the test are determined by
correlation coefficient. The limitations of this method are: (1) when the time interval is short, memory effects
may operate. (2) when the time interval is long, such factors as unlearning, forgetting, among others, may
occur and may result to low correlation of the test; and (3) regardless of the time interval separating the two
administrations, other varying environmental conditions may affect the correlation of the test.

Test-retest reliability sometimes called retest reliability measures test consistency—the reliability of
a test measured over time. In other words, give the same test twice to the same group of students at different
times to see if the scores are the same. For example, test on Monday, then again the following Monday.
The two scores are correlated.

Bias is known problem of this type of reliability due to feedback between tests, participants gaining
knowledge about the purpose of the test, so they are more prepared the second time around.

Parallel-forms method or the Test of Equivalence. It uses one set of questions into two equivalent
sets or forms, where both sets contain questions that measure the same construct, knowledge, and skill.
The two sets of questions are given to the same sample of students within a short period of time and an
estimate of reliability is calculated from the two sets.

Simply, it tries to find out if test A measures the same thing in test B. Example: A teacher wants to
find the reliability for a test of mathematics comprehension, so a set of 100 questions are constructed to
measure that construct. Randomly split the questions into two sets of 50 (set A and set B), and administer
those questions to the same group of students a week apart.

Step 1: Give test A to a group of 50 students on a Monday.


Step 2: Give test B to the same group of students that Friday.
Step 3: Correlate the scores from test A and test B.

Split-half method. The test in this method may be administered once, but the test items are divided
into two halves. The common procedure is to divide a test into odd and even items. The two halves of the
test must be similar but not identical in content, number of times, difficulty, means and standard deviations.
In the split-half reliability, a test for a single knowledge area is split into two parts and then both
parts given to one group of students at the same time. The scores from both parts of the test are correlated.
A reliable test will have high correlation, indicating that a student would perform equally well or as poorly on
both halves of the test. Split-half testing is a measure of internal consistency—how well the test components
contribute to the construct that is being measured. These are the steps:

1. Administer the test to a large group of students, ideally over 30 students.


2. Randomly divide the test questions into two parts. Example—separate even questions from odd
questions.
3. Score each half of the test for each student.
4. Find the correlation coefficient for the two halves.

Internal-consistency method. This method is used with psychological tests, which are constructed
as dichotomously scored items. The testee either passes or fails in an item. The method of obtaining
reliability coefficients in this method is determined by the Kuder-Richardson formula.

Kuder-Richardson Formula 20 or KR-20 is a measure of reliability for a test with binary variables
like answers that are right and wrong. The KR-20 is used for items that have varying difficulty. For example,
some items might be very easy, others more challenging. It should only be used if there is a correct answer
for each question—it should not be used for questions with partial credit is possible or for scales like the
Likert Scale.

The Concept of Error in Assessment

The concept of error in assessment is critical to our understanding of reliability. Conceptually,


whenever we assess something, we get an observed score or result. This observed score is a product of
what the true score or real ability or skill is plus some degree of error.

Observed Score = True Score + Error

Thus, an observed score can be higher or lower than the true score, depending on the nature of
error. The sources of error are reflected in the table.

Table 7. Sources of Error

Internal Error External Error


Health Directions

Mood Luck

Motivation Item Ambiguity

Test-taking skills Heat in the room

Anxiety Lighting

Fatigue Sample of items

General Ability Observer differences and bias

Test interpretation and scoring

Test Reliability Enhancers

Consider the following for enhancing the reliability of classroom assessments.


1. Use sufficient number of items or tasks. Other things equal, longer test is more reliable.
2. Use independent raters or observers who provide similar scores to the same performances.
3. Make sure the assessment procedures and scoring are objective as possible.
4. Continue assessment until the results are consistent.
5. Variation with the testing situation. Eliminate or reduce the influence of extraneous events or
factors. Errors in the testing situation may include students misunderstanding or misreading
the test directions, noise level, distractions, and sickness. These testing situations can cause
test scores to vary.
6. Difficulty level of the test. When there is little variability among the test scores, the reliability will
be low. Thus, reliability will be low if a test is so easy that every student gets most or all of the
items correct or so difficult that every student gets most or all of the items wrong.
7. Use shorter assessment more frequently than fewer long assessment.
8. Group homogeneity. In general, the more heterogeneous the group of students who take the
test, the more reliable the measure will be.

Fairness pertains to the intent that each question should be clear as possible to the examinees and
the test is absent of any biases. An example of a bias in an intelligence test is an item which is about a
person or object that has not been part of the cultural and educational context of the test taker. In
mathematics tests, the reading difficulty level of an item can be a source of unfairness.

The fairness of a test refers to its freedom from any kind of bias. The test should be appropriate for
all qualified examinees irrespective of race, religion, gender or age. The tests should not disadvantage any
examinee, or group of examinees, on any basis other than the examinee’s lack of knowledge and skills the
test is intended to measure. Item writers should address the goal of fairness as they undertake the task of
writing items. In addition, the items should also be reviewed for potential fairness problems during the item
review phase. Any items that are identified as displaying potential bias or lack of fairness should then be
revised from further consideration.

There are elements of fairness. These are: the student knowledge of learning targets before
instruction, the opportunity to learn, the attainment of pre-requisite knowledge and skills, unbiased
assessment tasks and procedures and by teachers who avoid stereotypes.

Positive Consequences on both students and teachers enhance the overall quality of assessment,
particularly the effect of assessments on the student motivation and study habits.

Practicality and Efficiency. Assessment needs to take into consideration the teacher’s familiarity
with the method, the time required, the complexity of administration, the ease of scoring and interpretation,
and cost in order to determine the assessment’s practicality and efficiency. Administrability requires that a
test must be administered with ease, clarity and uniformity. Directions must be specific so that students and
teachers will understand what they must exactly do. Scorability demands that a good test is easy to score.
Test results should readily be available to both students and teachers for remedial and follow- up measures.

You might also like