Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

EL 114

ASSESSMENT IN LEARNING 1

Module 2

PRINCIPLES OF HIGH-QUALITY ASSESSMENT

Objectives

At the end of the module, the learners are expected to:

 identify what constitutes high-quality assessment;


 list down the productive and unproductive uses of tests; and
 classify the various types of tests.

Characteristics of High-Quality Assessments

1. Clear and Appropriate Learning Targets


a) Knowledge learning target—is the ability of the student to master a substantive subject
matter.
b) Reasoning learning target—is the ability to use knowledge and solve problems
c) Skill learning target—is the ability to demonstrate achievement-related skills like
conducting experiments, playing basketball, and operating computers.
d) Product-learning target—is the ability to create achievement-related products such as
written reports, oral presentations, and art products.
e) Affective learning target—is the attainment of affective traits such as attitudes, values,
interests, and self-efficacy.
2. Appropriateness of Assessment Methods
Once the learning targets have been identified, match them with their corresponding
methods by considering the strengths of various methods in measuring different targets.
3. Validity
 This refers to the extent to which the test serves its purpose or the efficiency with which it
intends to measure.
 Validity is a characteristic that pertains to the appropriateness of the inferences, uses and
results of the test or any other method utilized to gather data.

Factors that influence the validity of the test:


a. How Validity is Determined
i. Content-related validity—determines the extent to which the assessment is the
representative of the domain of interest.
 Once the content domain is specified, review the test items to be assured
that there is a match between the intended inferences and what is on the
test.
 A test blue print or table of specification will help further delineate what
targets should be assessed and what is important from the content
domain.
ii. Criterion-related validity—determines the relationship between an assessment
and another measure of the same trait.
 It provides validity by relating an assessment to some valued measure
(criterion) that can either provide an estimate of current performance
(concurrent criterion-related evidence) or predict future performance
(predictive criterion-related evidence).
iii. Construct-related validity—determines which assessment is a meaningful
measure of an unobservable trait or characteristic like intelligence, reading
comprehension, honesty, motivation, attitude, learning style, and anxiety.
iv. Face-validity—is determined on the basis of the appearance of an assessment,
whether based on the superficial examination of the test, there seems to be a
reasonable measure of the objectives of a domain.
v. Instructional-related validity—determines to what extent the domain of content
in the test is taught in class.
b. Test Validity Enhancers
i. Prepare a table of specifications (TOS).
ii. Construct appropriate test items.
iii. Formulate directions that are brief, clear, and concise.
iv. Consider the reading vocabulary of the examinees. The test should not be made
up of jargons.
v. Make the sentence structure of your test items simple.
vi. Never have an identifiable pattern of answers.
vii. Arrange the test items from easy to difficult.
viii.Provide adequate time for student to complete the assessment.
ix. Use different methods to assess the same thing.
x. Use the test only for intended purposes.
4. Reliability
 This refers to the consistency with which a student may be expected to perform on a
given test.
 It means the extent to which a test is dependable, self-consistent, and stable.

Factors that affect test reliability:

1) scorer’s inconsistency because of his/her subjectivity;


2) limited sampling because of incidental inclusion and accidental exclusion of some
materials in the test;
3) changes in the individual examinee himself/herself and his/her instability during the
examination; and
4) testing environment.
a. How Reliability is Determined
There are various ways of establishing test reliability. These are the length of the
test, difficulty of the test, and objectivity of the scorer. There are also four methods in
estimating the reliability of a good measuring instrument.
1. Test-Retest Method or Test of Stability
 The same instrument is administered to the same group or subjects.
 The scores of the first and second administrations of the test are
determined by correlation coefficient.
 The limitations of this method are: (1) memory effects may operate when
the time interval is short; (2) factors such as unlearning and forgetting may
occur when the time interval is long resulting in low correlation of the test;
and (3) other varying environmental conditions may affect the correlation
of the test regardless of the time interval separating the two
administrations.
2. Parallel-Forms Method or Test of Equivalence
 This may be administered to the group of subjects and the paired
observations correlated.
 The two forms of the test must be constructed in a manner that the
content, type of item, difficulty, instructions for administration and several
others, should be similar but not identical.
3. Split-Half Method
 The test in this method may only be administered once, but the test items
are divided into two halves.
 The common procedure is to divide a test into odd and even items.
 The two halves of the test must be similar but not identical in content,
difficulty, means and standard deviations.
4. Internal-Consistency Method
 This method is used with psychological tests, which are constructed as
dichotomously scored items.
 The testee either passes or fails in an item.
 The method of obtaining reliability coefficients in this method is
determined by the Kuder-Richardson formula
b. The Concept of Error in Assessment
The concept of error in assessment is critical to the understanding of reliability.
Conceptually, whenever something is assessed, an observed score or result is
produced. This observed score is the product of what the true score or real ability or
skill is plus some degree of error.

Observed score = True Score + Error

Thus, an observed score can be higher or lower than the true score, depending
on the nature or error. The sources of error are reflected in Table 2.2.

Table 2.2

SOURCES OF ERROR

Internal Error External Error


Health Directions
Mood Luck
Motivation Item ambiguity
Test-taking skills Heat in the room
Anxiety Lighting
Fatigue Sample of items
General ability Observer differences and bias
Test interpretation and scoring

c. Test Reliability Enhancers


i. Use a sufficient number of items or tasks. A longer test is more reliable.
ii. Use independent raters or observers who can provide similar scores to the same
performances.
iii. Make sure the assessment procedures and scoring are objective.
iv. Continue the assessment until the results are consistent.
v. Eliminate or reduce the influence of extraneous events or factors.
vi. Assess the difficulty level of the test.
vii. Use shorter assessments more frequently rather than a few long assessments.
5. Fairness
 This pertains to the intent that each question should be made as clear as possible to the
examinees and the test is absent of any biases.
 An example of a bias in an intelligence test is an item about a person or object that has
not been part of the cultural and educational context of the test taker.
 In mathematical tests for instance, the reading difficulty level of an item can be a source
of unfairness.
 Identified elements of fairness are the student’s knowledge of learning targets before
instruction, the opportunity to learn, the attainment tasks and procedures, and teachers
who avoid stereotypes.
6. Positive Consequences
These enhance the overall quality of assessment, particularly the effect of assessments
on the students’ motivation and study habits.
7. Practicality and Efficiency
 Assessments need to take into consideration the teacher’s familiarity with the method,
the time required, the complexity of administration, the ease of scoring and
interpretation, and the cost to be able to determine an assessment’s practicality and
efficiency.
 Administrability requires that a test must be administered with ease, clarity, and
uniformity.
 Directions must be specific so that students and teachers will understand what they
must do exactly.
 Scorability demands that good test should be easy to score.
 The test results should readily be available to both students and teachers for remedial
and follow-up measures.

Productive Uses of Tests


1. Learning Analysis
 Tests are used to identify the reasons or causes why students do not learn and the
solutions to help them learn.
 Ideally, a test should be designed to determine what students do not know so that
the teachers can take appropriate actions.
2. Improvement of Curriculum
 Poor performance in a test may indicate that the teacher is not explaining the
material effectively, the textbook is not clear, the students are not properly taught,
and the students do not see the meaningfulness of the materials.
 When only a few students do not see the meaningfulness of the materials.
 When only a few students have difficulties, the teacher can address them separately
and extend special help.
 If the entire class does poorly, the curriculum needs to be revised or special units
need to be developed for the class to continue.
3. Improvement of Teachers
 In a reliable grading system, the class average is the grade the teacher has earned.
4. Improvement of Instructional Materials
 Tests measure how effective instructional materials are in bringing about intended
changes.
5. Individualization
 Effective tests always indicate differences in students’ learning. These can serve as
bases for individual help.
6. Selection
 When enrollment opportunity or any other opportunity is limited, a test can be used
to screen those who are qualified.
7. Placement
 Tests can be used to determine to which category a student belongs.
8. Guidance and Counseling
 Results from appropriate tests, particularly standardized tests, can help teachers
and counselor guide students in assessing future academic and career possibilities.
9. Selling and Interpreting the School to the Community
 Effective tests help the community understand what the students are learning, since
test items are representative of the content of instruction.
 Tests can also be used to diagnose general schoolwide weaknesses and strengths
that require community or government support.
10. Identification of Exceptional Children
 Tests can reveal exceptional students inside the classroom. More often than not,
these students are overlooked and left unattended.
11. Evaluation of Learning Program
 Ideally, tests should evaluate the effectiveness of each element in a learning
program, not just blanket the information of the total learning environment.

Unproductive Uses of Tests


1. Grading. Tests should not be used as the only determinants in grading a student. Most
tests do not accurately reflect a student’s performance or true abilities. Poor performance
on a certain task may not only indicate failure but lack or absence of the needed
foundations as well.
2. Labeling. It is often a serious disservice to label a student, even if the label is positive.
Negative labels may lead the students to believe the label and act accordingly. Positive
labels, on the other hand, may lead the students to underachieve or avoid standing out as
different or become overconfident and not exert effort anymore.
3. Threatening. Tests lose their validity when used as disciplinary measures.
4. Unannounced Testing. Surprise tests are generally not recommended. More often than
not, they are the scapegoats of teachers who are unprepared, upset by an unruly class or
reprimanded by superiors. Studies reveal that students perform at a slightly higher level
when tests are announced; unannounced tests create anxiety on the part of the students,
particularly those who are already fearful of tests; unannounced tests do not give students
the adequate time to prepare; and surprise tests do not promote efficient learning or higher
achievement.
5. Ridiculing. This means using tests to deride students.
6. Tracking. Students are grouped according to deficiencies as revealed by tests without
continuous reevaluation, thus locking them into categories.
7. Allocating Funds. Some schools exploit tests to solicit for funding.

Classifications of Tests

1. Administration
a) Individual—given orally and requires the examinees’ constant attention since the
manner of answering may be as important as the score.
Example:
 Wechsler Adult Intelligence Scale
 PowerPoint presentation (used as a performance test in a speech class)
b) Group—for measuring cognitive skills to measure achievement. Most tests in schools
are considered group tests where different test takers can take the tests as a group.
2. Scoring
a) Objective—independent scorers agree on the number of points the answer should
receive, e.g., multiple choice and true or false.
b) Subjective—answers can be scored through various ways. These are then given
different values by scorers, e.g., essays and performance tests.
3. Sort of Response being emphasized
a) Power—allows examinees a generous time limit to be able to answer every item. The
questions are difficult and this difficulty is what is emphasized.
b) Speed—with severely limited time constraints but the items are easy and only a few
examinees are expected to make errors.
4. Types of Response the Examinees must Make
a) Performance—requires students to perform a task. This is usually administered
individually so that the examinee has performance in each task.
b) Paper-and-pencil—examinees are asked to write on paper.
5. What is Measured
a) Sample—limited representative test design to measure the total behavior of the
examinee, although no test can exhaustively measure all the knowledge of an
individual.
b) Sign test—diagnostic test designed to obtain diagnostic signs to suggest that some
form of remediation is needed.
6. Nature of the Groups being Compared
a) Teacher-made test—for use within the classroom and contains the subject being
taught by the same teacher who constructed the test.
b) Standardized test—constructed by test specialists working with curriculum experts and
teachers.

Other Types of Tests

1. Mastery tests measure the level of learning of a given set of materials and the level attained.
2. Discriminatory tests distinguish the differences between students or groups of students. It
indicates the areas where students need help.
3. Recognition tests require students to choose the right answer from a given set of response.
4. Recall tests require students to supply the correct answer from their memory.
5. Specific recall tests require short responses that are fairly objective.
6. Free recall tests require students to construct their own complex responses. There are no right
answers but a given answer might be better than the other.
7. Maximum performance tests require students to obtain the best score possible.
8. Typical performance tests measure the typical or usual or average performance.
9. Written tests depend on the examinees’ ability to speak. Logic is also required.
10. Oral examinations depend on the examinees’ ability to speak. Logic is also required.
11. Language tests require instructions and questions to be presented in words.
12. Non-language tests are administered by means of pantomime, painting or signs and symbols,
e.g., Raven’s Progressive Matrices or the Abstract Reasoning Tests.
13. Structured tests have very specific, well-defined instructions and expected outcomes.
14. Projective tests present ambiguous stimulus or questions designed to elicit highly individualized
responses.
15. Product tests emphasize only the final answer.
16. Process tests focus on how the examinees attack, solve, or work out a problem.
17. External reports are tests where a rate is evaluated by another person.
18. Internal reports are self-evaluation.
19. Open book tests depend on one’s understanding and ability to express one’s ideas and evaluate
concepts.
20. Closed book tests depend heavily on the memory of the examinees.
21. Non-learning format tests determine how much information the students know.
22. Learning format tests require the students to apply previously learned materials.
23. Convergent format tests require the students to apply previously learned materials.
24. Divergent format tests lead the examinees to several possible answers.
25. Scale measurements distribute ratings along a continuum.
26. Test measurements refer to the items being dichotomous or either right or wrong, but not both.
27. Pretests measure how much is known about a material before it is presented.
28. Posttests measure how much has been learned after a learning material has been given.
29. Sociometrics reveal the interrelationship among members or the social structure of a group.
30. Anecdotal records reveal episodes of behavior that may indicate profile of the students.

Table 2.3

COMPARISON BETWEEN TEACHER-MADE TESTS AND STANDARDIZED TESTS

Characteristic Teacher-Made Test Standardized Test


Directions for administration and Usually, no uniform directions Specific instructions
scoring are specified. standardized the administration
and scoring procedures.
Sampling content Both content and sampling are Content is determined by
determined by the classroom curriculum and subject matter
teacher. experts. It involves intensive
investigations of existing syllabi,
textbooks, and programs.
Sampling of content is done
systematically.
Construction May be hurriedly done because It uses meticulous construction
of time constraints; often no test procedures that include
blueprints, item tryouts, item constructing objectives and test
analysis or revision; quality of blueprints, employing item
test may be quite poor. tryouts, item analysis, and item
revisions.
Norms Only local classroom norms are In addition to local norms,
available. standardized tests typically
make available national school
district, and school building
norms.
Purpose and use Best suited for measuring Best suited for measuring
particular objectives set by the broader curriculum objectives
teacher and for intraclass and for interclass, school, and
comparisons. national comparisons.

Review Exercises

1. Explain why validity implies reliability but not the reverse.

2. List down your personal experiences of unfair assessments.

Reference:

Reganit, Arnulfo Aaron R, et al., Assessment of Student Learning 1 (Cognitive Learning). C & E
Publishing, Inc., Quezon City, PHL, 2010.
Prepared by
Ms. Cherry L. Ebano, LPT

You might also like