Modules in Assessment in Learning 1 For PRI

SULTAN KUDARAT STATE UNIVERSITY
Assessment of Student
Learning 1
Ernie C. Cerado, PhD

Ma. Dulce P. Dela Cerna, MIE
Editor/Compiler
ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE i

Preface
COVID-19 has affected the world at large, but this has
also given us a glimpse of the good that exists.
- Amit Gupta
With wide-ranging challenges brought about by the pandemic in

almost all communities to include academic, it otherwise brings an
opportunity for the faculty to develop teaching strategy and tools to
answer the learning needs of the students. The response however is
not universal but rather location-specific. There can be no “one-size-
fits-all” measure due to the varying resources, capacity, restrictions
and peculiarities of the campus, faculty, and students. As SKSU is a
state university where funds are normally limited, it is understood to
have more constraints than the needed resources. The faculty
readiness, student socio-economic histories, administrative support
and internet connectivity are among the primary considerations in
selecting the most workable instructional model. Since these factors
are obviously challenging, the use of printed learning modules
emerged as the most practical modality to adopt. This instructional
material nonetheless can be exploited better when used in
combination with other learning options such as online, SMS, voice
call, face-to-face or the blended way - thus, the suggested flexible
learning system. With the commitment of the university to facilitate the
free reproduction of the modules for every student, it is very likely that
optimal learning can still be achieved in the apparently crudest yet
safest method amidst serious health challenges.
Most importantly, the students are requested to maximize the

utilization of these learning modules inasmuch as this material is
afforded freely. At this volatile time, let the principle of “active
learning” comes into play; students are expected to be independent
and imaginative in learning. As matured learners, be responsible in
your own learning - be competent in “learning to learn.” This is the
main reason why a lot of assessment exercises and enrichment
activities are provided at the conclusion of each lesson.
ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE ii

Table of Contents
Foreword ii
Chapter 1 Outcomes-Based Education 1
Lesson 1 Understanding Outcomes-Based Education 1
Chapter 2 Introduction to Assessment in Learning 16
Lesson 1 Basic Concepts and Principles in Assessing 16
Learning
Lesson 2 Assessment Purposes, Educational Objectives, 32
Learning Targets and Appropriate Methods
Lesson 3 Classifications of Assessment 54
Chapter 3 Development and Enhancement of Tests 71
Lesson 1 Planning a Written Test 71
Lesson 2 Construction of Written Tests 90
Lesson 3 Improving a Classroom-Based Assessment 122
Lesson 4 Establishing Test Validity and Reliability 139
Chapter 4 Organization, Utilization, and Communication of 161
Test Results
Lesson 1 Organization of Test Data Using Tables and 162
Graphs
Lesson 2 Analysis, Interpretation, and Use of Test Data 191
Lesson 3 Grading and Reporting of Test Results 240
Appendix 1 Course Syllabus 278
ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE iii

CHAPTER 1
OUTCOMES-BASED EDUCATION
Overview
In response to the need for standardization of education systems and
processes, many higher education institutions in the Philippines shifted
attention and efforts toward implementing OBE system on school level. The
shift to OBE has been propelled predominantly because it is used as a
framework by international and local academic accreditation bodies in school-
and program-level accreditation, on which many schools invest their efforts
into. The Commission on Higher Education (CHED) even emphasized the
need for the implementation of OBE by issuing a memorandum order on the
“Policy Standard to enhance quality assurance in Philippine Higher Education
through an Outcomes-Based and Typology Based QA”. Consequently, a
Handbook of Typology, Outcomes-Based Education, and Sustainability
Assessment was released in 2014.
Given the current status of OBE in the country, this lesson aims to
shed light on some critical aspects of the framework with the hope of
elucidating important concepts that will ensure proper implementation of OBE.
Also, it zeroes in inferring implications of OBE implementation for assessment
and evaluation of students‟ performance.
Objective
Upon completion of this chapter, the students can achieve a good
grasp of outcomes-based education.
Lesson 1: Understanding Outcomes-Based Education
Pre-discussion
Primarily, this chapter will deal with the shift of educational focus from
content to learning outcomes particularly on the OBE: matching intentions
with the outcomes of education. The students can state and discuss the
change of educational focus from content to learning outcomes. They can
ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 1

present a sample educational objectives and learning outcomes in K to 12

subjects of their own choice.
What to Expect?
At the end of the lesson, the students can:
1. discuss outcomes-based education, its meaning, brief history and
characteristics;
2. identify the procedures in the implementation of OBE in subjects or
courses; and
3. define outcomes and discuss each type of outcomes.
Meaning of Education
According to some learned people the word education has been
derived from the Latin term “educatum” which means the act of teaching or
training. Other groups of educationalists say that it has come from another
Latin word “educare” which means to bring up or to raise. For a few others,
the word education has originated from another Latin word “educere” which
means to lead forth or to come out. All these meanings indicate that education
seeks to nourish the good qualities in man and draw out the best in every
individual; it seeks to develop the inner, innate capacities of man. By
educating an individual, we attempt to give him/her the knowledge, skills,
understanding, interests, attitudes, and critical thinking. That is, he/she
acquires knowledge of history, geography, arithmetic, language, and science.
Today, outcome-based education is the main thrust of the Higher
Education Institutions in the Philippines. The OBE comes in the form of
competency-based learning standards and outcomes-based quality
assurance monitoring and evaluating spelled out under the CHED
Memorandum Order No. 46. Accordingly, CHED OBE is different from
Transformational OBE on the following aspects:
 The CMO acknowledges that there are 2 different OBE frameworks,
namely: the strong and the weak.

 CHED subscribes to a weak or lower case due to the realities of the

Philippine higher education.
 CHED recognizes that there are better OBE frameworks than what
they implemented, which does not limit HEIs to the implementation of
the weak vs. the strong OBE.
 Spady’s OBE or what is otherwise called transformational OBE is
under the strong category of OBE.
What is OBE?
Outcomes-Based Education (OBE) is a process that involves the
restructuring of curriculum, assessment and reporting practices in education
to reflect the achievement of high order learning and mastery rather than the
accumulation of course credits. It is a recurring education reform model, a
student-centered learning philosophy that focuses on empirically measuring
student’s performance, which are called outcomes and on the resources that
are available to students, which are called inputs.
Furthermore, Outcome-Based Education means clearly focusing and
organizing everything in an educational system around what is essential for all
students to be able to do successfully at the end of their learning experiences.
This means starting with a clear picture of what is important for students to be
able to do, then organizing the curriculum, instruction, and assessment to
make sure that this learning ultimately happens.
For education stalwart Dr. William Spady, Outcome-Based Education
(OBE) is a paradigm shift in the education system that’s changing the way
students learn, teachers think and schools measure excellence and success.
He came to the Philippines to introduce OBE in order to share the benefits of
OBE. Spady said in conceptualizing OBE in 1968, he observed the US
education system was more bent on how to make them achieve good scores.
“So there are graduates who pass exams, but lack skills. Then there are those
who can do the job well yet are not classic textbook learners.” Furthermore,
he said that OBE is also more concerned not with one standard for assessing
the success rate of an individual. “In OBE, real outcomes take us far beyond
the paper-and-pencil test.” An OBE-oriented learner thinks of the process of

learning as a journey by itself. He acknowledged that all students can learn

and succeed, but not on the same day in the same way.
As a global authority in educational management and the founder of
OBE learning philosophy, Spady sees that unlike previous learning strategies
where a learner undergoes assessment to see how much one has absorbed
lessons, OBE is more concerned with how successful one is in achieving what
needs to be accomplished in terms of skills and strategies. “It’s about
developing a clear set of learning outcomes around which an educational
system can focus,” he said. Outcomes are clear learning results that students
can demonstrate at the end of significant learning experiences. They are what
learners can actually do with what they know and have learned.” Outcomes-
Based Education expects active learners, continuous assessment, knowledge
integration, critical thinking, learner-centered, and learning programs. Also, it
is designed to match education with actual employment. Philippine higher
education institutes are encouraged to implement OBE not only to be locally
and globally competitive but also to work for transformative education.
Elevating the Educational Landscape for Higher Education

This shift of learning paradigm is important and necessary as
globalization is on the pipeline. Students are not prepared only for the
acquisition of professional knowledge but they must be able to perform hands-
on work and knowledge application/replication in different work settings and
societies. Alongside with it, students should possess such generic (all-round)
attributes like lifelong learning aptitude, team work attitudes, communication
skills, etc. in order to face the ever-changing world/society.
Learning outcomes statements to be useful should be crafted to inform
effective educational policies and practices. When they are clear about
proficiencies students are to achieve, such statements provide reference
points for student performance, not just for individual courses but the
cumulative effects of a program of study.
The CHED required the implementation of Outcomes-Based Education
(OBE) in Philippine universities and colleges in 2012 through CHED
Memorandum Order No. 46. As a leading learning solutions provider in the

Philippines, learning materials are aligned with OBE through the following
features:
Learning Objectives - Statements that describe what learners/students are
expected to develop by the time they finish a particular chapter. This may
include the cognitive, psychomotor, and affective aspects of learning.
Teaching Suggestions - This section covers ideas, activities, and strategies
that are related to the topic and will help the instructor in achieving the
Learning Objectives.
Chapter Outline - This section shows the different topics/subtopics found in
each chapter of the textbook.
Discussion Questions - This section contains end-of-chapter questions that
will require students to use their critical thinking skills to analyze the
factual knowledge of the content and its application to actual human
experiences.
Experiential Learning Activities - This includes activities that are flexible in
nature. This may include classroom/field/research activities, simulation
exercises, and actual experiences in real-life situations.
Objective type of tests to test knowledge of students may include any of the
following:
- Identification
- True or False
- Fill in the blank
- Matching type
- Multiple Choice
Answer Keys to the test questions must be provided*
Assessment for Learning - This may include rubrics that will describe and
evaluate the level of performance/expected outcomes of the learners.
The Outcomes of Education

Learning outcomes are statements that describe significant and
essential learning that learners have achieved, and can reliably demonstrate
at the end of a course or program. In other words, learning outcomes identify
what the learner will know and be able to do by the end of a course or
program. Examples that are specific and relatively easy to measure are:

 …CAN read and demonstrate good comprehension of text in areas of

the student’s interest or professional field.
 …CAN demonstrate the ability to apply basic research methods in
psychology, including research design, data analysis, and
interpretation.
 …CAN identify environmental problems, evaluate problem-solving
strategies, and develop science-based solutions.
 …CAN demonstrate the ability to evaluate, integrate, and apply
appropriate information from various sources to create cohesive,
persuasive arguments, and to propose design concepts.
It is grounded on the principles of: clarity of focus of significance,

expanded opportunity for students to succeed, high expectations for quality
performance, and design down from where you want to end up.
Clarity of focus. Educators should be made aware and conscious about the
outcomes of education each student must manifest or demonstrate at the
course level and that these outcomes at the classroom level are
connected to the attainment of higher level outcomes (i. e.,
program/institutional outcomes and culminating outcomes). Thus, at the
initial stage of academic or course planning, the higher outcomes serve
as guide for educators in defining and clearly stating the focus of the
course/subject. This principle implies that the criteria of attainment of
learning outcomes (students‟ learning performance) that can be elicited
through assessments should exhibit a particular standard that applies to
all learners. In effect, this standardizes the assessment practices and
procedures used by educators in specific subject/course.
High expectations. As stated in the clarity of focus principle, learning
outcomes at the course level are necessarily connected to higher level
outcomes. These connections warrant educators from eliciting high level
of performance from students. This level of performance ensures that
students successfully meet desired learning outcomes set for a course,
and consequently enable them to demonstrate outcomes at higher levels
(program or institutional level). Thus, the kind of assessments in OBE

learning context should challenge students enough to activate and enable

higher order thinking skills (e. g., critical thinking, decision making,
problem solving, etc.), and should be more authentic (e. g., performance
tests, demonstration exercise, simulation or role play, portfolio, etc.).
Expanded opportunity. The first and second principles importantly
necessitate that educators deliver students‟ learning experiences at an
advanced level. In the process, many students may find it difficult
complying with the standards set for a course. As a philosophical
underpinning of OBE, Spady (1994) emphasized that “all students can
learn and succeed, but not on the same day, in the same way.” This
discourages educators from generalizing manifestations of learned
behavior from students, considering that every student is a unique
learner. Thus, an expanded opportunity should be granted to students in
the process of learning and more importantly in assessing their
performance. The expansion of opportunity can be considered
multidimensional (i. e., time, methods and modalities, operational
principles, performance standards, curriculum access and structuring). In
the assessment practice and procedures, the time dimension implies that
educators should give more opportunities for students to demonstrate
learning outcomes at the desired level. Thus, provisions of remedial,
make-up, removal, practice tests, and other expanded learning
opportunities are common in OBE classrooms.
Design down. This is the most crucial operating principle of OBE. As
mentioned in the previous section, OBE implements a top-down approach
in designing and stating the outcomes of education (i. e., culminating -
enabling - discrete outcomes). The same principle can be applied in
designing and implementing outcomes‟ assessments in classes.
Traditionally, the design of assessments for classes is done following a
bottom-up approach. Educators would initially develop measures for micro
learning tasks (e. g., quizzes, exercises, assignments, etc.), then proceed
to develop the end-of-term tasks (e. g., major examination, final project,
etc.). In OBE context, since the more important outcomes that should be
primarily identified and defined are the culminating ones, it follows that the
same principle should logically apply.

However, in a traditional education system and economy, students are

given grades and rankings compared to each other. Content and performance
expectations are based primarily on what was taught in the past to students of
a given age. The basic goal of traditional education was to present the
knowledge and skills of the old generation to the new generation of students,
and to provide students with an environment in which to learn, with little
attention (beyond the classroom teacher) to whether or not any student ever
learns any of the material. It was enough that the school presented an
opportunity to learn. Actual achievement was neither measured nor required
by the school system.
In fact, under the traditional model, student performance is expected to
show a wide range of abilities. The failure of some students is accepted as a
natural and unavoidable circumstance. The highest-performing students are
given the highest grades and test scores, and the lowest performing students
are given low grades. Local laws and traditions determine whether the lowest
performing students were socially promoted or made to repeat the year.
Schools used norm-referenced tests, such as inexpensive, multiple-choice
computer-scored questions with single correct answers, to quickly rank
students on ability. These tests do not give criterion-based judgments as to
whether students have met a single standard of what every student is
expected to know and do: they merely rank the students in comparison with
each other. In this system, grade-level expectations are defined as the
performance of the median student, a level at which half the students score
better and half the students score worse. By this definition, in a normal
population, half of students are expected to perform above grade level and
half the students below grade level, no matter how much or how little the
students have learned.
In outcomes-based education, classroom instruction is focused on the
skills and competencies that students must demonstrate when they exit.
There are two types of outcomes: immediate and deferred outcomes.
Immediate outcomes are competencies and skills acquired upon
completion of a subject; a grade level, a segment of a program, or of a
program itself. Examples of these are:
 Ability to communicate in writing and speaking
 Mathematical problem-solving skills

 Skill in identifying objects by using the different senses
 Ability to produce artistic or literary works
 Ability to do research and write the results
 Ability to present an investigative science project
 Skill in story-telling
 Promotion to a higher grade level
 Graduation from a program
 Passing a required licensure examination
 Initial job placement
On the other hand, deferred outcomes refer to the ability to apply
cognitive, psychomotor, and affective skills/competencies in various situations
many years after completion of a subject; grade level or degree program.
Examples of these are:
 Success in professional practice or occupation
 Promotion in a job
 Success in career planning, health, and wellness
 Awards and recognition
Summary
The change in educational perspective is called Outcomes-Based
Education (OBE) which is characterized with the following:
 It is student-centered; that is, it places the students at the center of the
process by focusing on Student Learning Outcome (SLO).
 It is faculty driven; that is, it encourages faculty responsibility for
teaching, assessing program outcomes, and motivating participation
from the students.
 It is meaningful; that is, it provides data to guide the teacher in making
valid and continuing improvement in instruction and other assessment
activities.
To implement OBE on the subject or the course, the teacher should
identify the educational objectives of the subject course so that he/she can
help students develop and enhance their knowledge, skills, and attitudes;
he/she must list down all learning outcomes specified for each subject or the
course objectives. A good source of learning outcomes statements is the
taxonomy of educational objectives by Benjamin Bloom which is grouped into
three domains: the Cognitive, also called knowledge, refers to mental skills
such as remembering, understanding, applying, analyzing, evaluating,
synthesizing, creating; the Psychomotor, also referred to as skills, includes
manual or physical skills, which proceed from mental activities and range from
the simplest to the complex such as observing, imitating, practicing, adapting,
and innovating; the Affective, also known as the attitude, refers to growth in
feelings or emotions, from the simplest behavior to the most complex such as
receiving, responding, valuing, organizing, and internalizing.
The emphasis in an OBE education system is on measured outcomes
rather than "inputs," such as how many hours students spend in class, or
what textbooks are provided. Outcomes may include a range of skills and
knowledge. Generally, outcomes are expected to be concretely measurable,
that is, "Student can run 50 meters in less than one minute" instead of
"Student enjoys physical education class." A complete system of outcomes for
a subject area normally includes everything from mere recitation of fact
("Students will name three tragedies written by Shakespeare") to complex
analysis and interpretation ("Student will analyze the social context of a
Shakespearean tragedy in an essay"). Writing appropriate and measurable
outcomes can be very difficult, and the choice of specific outcomes is often a
source of local controversies.
Learning outcomes describe the measurable skills, abilities, knowledge
or values that students should be able to demonstrate as a result of a
completing a course. They are student-centered rather than teacher-centered,
in that they describe what the students will do, not what the instructor will
teach. They are not standalone statements. They must all relate to each other
and to the title of the unit and avoid repetition. Articulating learning outcomes
for students is part of good teaching. If you tell students what you expect them
to do, and give them practice in doing it, then there is a good chance that they
will be able to do it on a test or major assignment. That is to say, they will
have learned what you wanted them to know. If you do not tell them what they

will be expected to do, then they are left guessing what you want. If they
guess wrong, they will resent you for being tricky, obscure or punishing.
Finally, outcomes assessment procedures must also be drafted to
enable the teacher to determine the degree to which the students are
attaining the desired learning outcomes. It identifies for every outcome the
data that will be gathered which will guide the selection of the assessment
tools to be used and at what point assessment will be done.
Enrichment
Secure a copy of CHED Memorandum Order No. 46, s. 2012 re “Policy

Standard to enhance Quality Assurance in Philippine Higher Education
through an Outcomes-Based and Typology-Based QA.” You may download the
document from this link https://ched.gov.ph/2012-ched-memorandum-orders/.
Find out the detailed OBE standards in higher education.
You may refer to your Professor any queries or clarifications you want from
what you have read during his/her consultation period.
Assessment
Activity 1. Fill up the matrix based from your findings of the Educational
Objectives (EO) and create your own Learning Outcomes (LO).

Activity 2. Research the nature of education and be able to submit/present

your outputs in power point/slides.
Activity 3. The following statements are incorrect. On the blank before each
number, write the letter of the section which makes the sentence wrong, and
on the blank after each number, re-write the wrong section to make the
sentence correct.
____1. Because of knowledge explanation/ brought about by the use of/
(a) (b)
computers in education/ the teacher ceased to be the sole source
(c) (d)
of knowledge.
_____________________________________________________________
______________________________________________________________
____2. At present, / the teacher is the giver of knowledge/ by assisting/in the
(a) (b) (c)
organization of facts and information.
(d)
______________________________________________________________
______________________________________________________________
_____3. The change of focus/ in instruction/ from outcomes to content/ is
(a) (b) (c)
known as Outcomes-Based Education.
(d)
______________________________________________________________
______________________________________________________________
____4. A good source/ of subject matter statement/ is Benjamin Bloom’s/

(a) (b) (c)
Taxonomy of Educational Objectives.
(d)
______________________________________________________________
______________________________________________________________
____5. Education comes/ from the Latin root/ “educare” or “educere”/ which
(a) (b) (c)
means to “pour in”.
(d)
______________________________________________________________

______________________________________________________________
____6. In the past,/ the focus/ of instruction/ was learning outcomes.
(a) (b) (c) (d)
______________________________________________________________
______________________________________________________________
____7. Ability to communicate/ in writing and speaking/ is an example/ of
(a) (b) (c)
deferred outcome.
(d)
______________________________________________________________
______________________________________________________________
___8. The content and the outcome/ are the two/ main elements/ of the
(a) (b) (c) (d)
educative process.
______________________________________________________________
______________________________________________________________
___9. Affective, refers to mental skills/ such as remembering,/ understanding,

(a) (b) (c)
applying, analyzing, evaluating,/ synthesizing, and creating.
(d)
______________________________________________________________
______________________________________________________________
___10. Immediate outcome is the ability/ to apply cognitive, psychomotor, and

(a) (b)
affective skills/ in various situations many years /after completion of a
(c) (d)
course or degree program.
______________________________________________________________
______________________________________________________________
Activity 4. Give the meaning of the following word or group of words. Write
your answers on the spaces provided for after each number.
1. Outcomes-Based Education
________________________________________________________
________________________________________________________

________________________________________________________
________________________________________________________
2. Immediate Outcome
________________________________________________________
________________________________________________________
________________________________________________________
3. Deferred Outcome
________________________________________________________
________________________________________________________
________________________________________________________
4. Educational Objective
________________________________________________________
________________________________________________________
________________________________________________________
5. Learning Outcome
________________________________________________________
________________________________________________________
________________________________________________________
6. Student-Centered Instruction
________________________________________________________
________________________________________________________
________________________________________________________
7. Content-Centered Instruction
________________________________________________________
________________________________________________________
________________________________________________________
8. Psychomotor Skill
________________________________________________________

________________________________________________________
________________________________________________________
9. Cognitive Skill
________________________________________________________
________________________________________________________
________________________________________________________
10. Clarity of focus

________________________________________________________
________________________________________________________
________________________________________________________
References
De Guzman, E. and Adamos, J. (2015). Assessment of Learning 1. Quezon
City: Adriana Publishing Co., Inc.
Macayan, Jonathan (2017).Implementing Outcome-Based Education (OBE)
Framework: Implications for Assessment of Students’ Performance.
Educational Measurement and Evaluation Review (2017), Vol. 8 (1).
Navarro, R., Santos, R. and Corpuz, B. (2017). Assessment of Learning I (3 rd.
ed.). Metro Manila: Lorimar Publishing, Inc.
CHAPTER 2
INTRODUCTION TO ASSESSMENT IN LEARNING
Overview
Clear understanding of the course on Assessment of Learning has to
begin with one’s complete awareness of the fundamental terms and
principles. Most importantly, a good grasp of the concepts like assessment,
learning, evaluation, measurement, testing and test is a requisite knowledge
for every pre-service teacher. Sufficient information of these pedagogic
elements would certainly heighten his or her confidence in teaching. The
principles behind assessment are similarly necessary to be studied as all
activities related to it must be properly grounded; otherwise, it is not sound
and meaningless. Objective, content, method, tool, criterion, recording,

procedure, feedback, and judgment are some significant factors that must be
considered to undertake quality assessment.
Objective
Upon completion of the unit, the students can discuss the fundamental
concepts, principles, purposes, roles and classifications of assessment, as
well as align the assessment methods to learning targets.
Lesson 1: Basic Concepts and Principles in Assessment
Pre-discussion
Study the picture in Figure 1.
Has this something to do with
assessment? What are your
comments?
What to Expect?
1. make a personal definition of assessment;
2. compare assessment with measurement and evaluation;
3. discuss testing and grading;
4. explain the different principles in assessing learning;
5. relate an experience as a student or pupil related to each principle;
6. comment on the tests administered by the past teachers; and
7. perform simple evaluation.
What is assessment?
Let us have some definitions of assessment from varied sources:

1. Assessment involves the use of empirical data on student learning to

refine programs and improve student learning. (Assessing Academic
Programs in Higher Education by Allen 2004)
2. Assessment is the process of gathering and discussing information from
multiple and diverse sources in order to develop a deep understanding of
what students know, understand, and can do with their knowledge as a
result of their educational experiences; the process culminates when
assessment results are used to improve subsequent learning. (Learner-
Centered Assessment on College Campuses: shifting the focus from
teaching to learning by Huba and Freed 2000)
3. Assessment is the systematic basis for making inferences about the
learning and development of students. It is the process of defining,
selecting, designing, collecting, analyzing, interpreting, and using
information to increase students' learning and development. (Assessing
Student Learning and Development: A Guide to the Principles, Goals, and
Methods of Determining College Outcomes by Erwin 1991)
4. Assessment is the systematic collection, review, and use of information
about educational programs undertaken for the purpose of improving
student learning and development (Palomba & Banta, 1999).
5. Assessment refers to the wide variety of methods or tools that educators
use to evaluate, measure, and document the academic readiness, learning
progress, skill acquisition, or educational needs of students (Great School
Partnership, 2020).
6. David et al. (2020:3) defined assessment as the “process of gathering
quantitative and/or qualitative data for the purpose of making decisions.”
7. Assessment is defined as a process that is used to keep track of learners’
progress in relation to learning standards and in the development of 21 st
century skills; to promote self-reflection and personal accountability among
students about their own learning; and to provide bases for the profiling of
student performance on the learning competencies and standards of the
curriculum (DepEd Order No. 8, s. 2015).
Assessment is one of the most critical dimensions of the education
process; it focuses not only on identifying how many of the predefined
education aims and goals have been achieved but also works as a feedback

mechanism that educators should use to enhance their teaching practices.

Assessment is located among the main factors that contribute to a high quality
teaching and learning environment.
The value of assessment can be seen in the links that it forms with
other education processes. Thus, Lamprianou and Athanasou (2009:22)
pointed out that assessment is connected with the education goals of
“diagnosis, prediction, placement, evaluation, selection, grading, guidance or
administration”. Moreover, Biggs (1999) regarded assessment to be a critical
process that provides information about the effectiveness of teaching and the
progress of students and also makes clearer what teachers expect from
students.
Meaning of Learning
We all know that the human brain is immensely complex and still
somewhat of a mystery. It follows then, that learning as a primary function of
the brain is appreciated in many different senses.
To provide you sufficient insights of the term, here are several manners
that learning can be described:
1. A change in human disposition or capability that persists over a period of
time and is not simply ascribable to processes of growth.” (From The
Conditions of Learning by Robert Gagne)
2. Learning is the relatively permanent change in a person’s knowledge or
behavior due to experience. This definition has three components: 1) the
duration of the change is long-term rather than short-term; 2) the locus of
the change is the content and structure of knowledge in memory or the
behavior of the learner; 3) the cause of the change is the learner’s
experience in the environment rather than fatigue, motivation, drugs,
physical condition or physiologic intervention. (From Learning in
Encyclopedia of Educational Research, Richard E. Mayer)
3. It has been suggested that the term learning defies precise definition
because it is put to multiple uses. Learning is used to refer to (1) the
acquisition and mastery of what is already known about something, (2) the
extension and clarification of meaning of one’s experience, or (3) an
organized, intentional process of testing ideas relevant to problems. In

other words, it is used to describe a product, a process, or a function.

(From Learning How to Learn: Applied Theory for Adults by R.M. Smith)
4. A process that leads to change, which occurs as a result of experience
and increases the potential of improved performance and future learning.
(From Make It Stick: The Science of Successful Learning by Peter C.
Brown, Henry L. Roediger III, Mark A. McDaniel)
5. The process of gaining knowledge and expertise. (From How Learning
Works: Seven Research-Based Principles for Smart Teaching by Susan
Ambrose, et al.)
6. A persisting change in human performance or performance potential which
must come about as a result of the learner’s experience and interaction
with the world. (From Psychology of Learning for Instruction by M. Driscoll)
7. Learning is “a process that leads to change, which occurs as a result of
experience and increases the potential for improved performance and
future learning” (Ambrose et al, 2010:3). The change in the learner may
happen at the level of knowledge, attitude or behavior. As a result of
learning, learners come to see concepts, ideas, and/or the world
differently. It is not something done to students, but rather something
students themselves do. It is the direct result of how students interpret and
respond to their experiences.
From the foregoing definitions, learning can be briefly stated as a
change in learner’s behaviour towards an improved level resulting from one’s
experiences and interactions with his environment.
Study the following figures to appreciate better the meaning of
“learning.”

Figure 2
Figure 3
Figure 4
You may be thinking that learning to bake cookies and learning
something like Chemistry are not the same at all. In a way, you are right
however, the information you get from assessing what you have learned is the
same. Brian used what he learned from each batch of cookies to improve the
next batch. You also learn from every homework assignment that you
complete, and in every quiz you take what you still need to study to know the
material.
Measurement and Evaluation

Calderon and Gonzales (1993) defined measurement as the process
of determining the quantity of achievement of learners by means of
appropriate measuring instruments. In measuring, we often utilize some
standard instruments to assign numerals to traits such as achievement,
interest, attitudes, aptitudes, intelligence and performance. Paper and pencil

test is the primary instrument in the common practice of educators. They

measure specific elements of learning such as readiness to learn, recall of
facts, demonstration of skills, or ability to analyze and solve practical
problems.
Generally, values of certain attribute are translated into numbers by
measurement. Nonetheless, a quantitative measure like a score of 65 out of
80 in written examinations does not have meaning unless interpreted.
Essentially, measurement ends when a numerical value is assigned while
evaluation comes in next.
On the other hand, evaluation is possibly the most complex and least
understood among the basic terms in assessment of learning. Inherent in the
idea of evaluation is "value." When we evaluate, what we are doing is
engaging in some process that is designed to provide information that will
help us make a judgment about a given situation. Generally, any evaluation
process requires information about the situation in question.
In education, evaluation is the process of using the measurements
gathered in the assessments. Teachers use this information to judge the
relationship between what was intended by the instruction and what was
learned. They evaluate the information gathered to determine what students
know and understand; how far they have progressed and how fast; and how
their scores and progress compare to those of other students.
In short, evaluation is the process of making judgments based on
standards and evidences derived from measurements. It is now giving
meaning to the measured attributes. With this, it is implicit that a sound
evaluation is dependent on the way measurement was carried out. Ordinarily,
teachers’ decision to pass or fail a learner is determined by his obtained grade
relative to the school standard. Thus, if one’s final grade is 74 or lower then it
means failing; otherwise, it is a passing when the final grade is 75 or better
since the standard passing or cut-off grade is 75. The same scenario takes
place in the granting of academic excellence awards such as Valedictorian,
Salutatorian, First Honors, Second Honors, Cum laude, Magna cum laude,
Summa cum laude, etc. Here, evaluation means comparing one’s grade or
achievement against an established standards or criteria to arrive at a

decision. Therefore, grading of students in schools must be credible to ensure

that giving of awards would be undisputable.
Testing and Grading

A test is used to examine someone's knowledge of something to
determine what he or she knows or has learned. Testing measures the level
of skill or knowledge that has been reached.
David et al. (2020:4) wrote that testing is the most common form of
assessment. It refers to the use of test or battery of tests to collect information
on student learning over a specific period of time. A test is a form of
assessment, but not all assessments use tests or testing.
De Guzman and Adamos (2015:2) described testing to be a “formal,
systematic procedure for gathering information” while a test is a “tool consists
of a set of questions administered during a fixed period of time under
comparable conditions for all students.” Most of the educational tests are
intended to measure a construct. They may also be used to measure the
learner’s progress in both formative and summative purposes. In practice, a
typical teacher often gives short quiz after teaching a lesson to determine
attainment of the learning outcomes. He also undertakes long assessments
upon completion of a chapter, unit, chapter or course to test the learners’
degree of achievement. In similar way, the Professional Regulation
Commission (PRC) and Civil Service Commission (CSC) are administering
licensure and eligibility examinations to test the readiness or competence of
would-be professionals.
On the other hand, grading implies combining several assessments,
translating the result into some type of scale that has evaluative meaning, and
reporting the result in a formal way. Hence, grading is a process and not
merely quantitative values. It is the one of the major functions, results, and
outcomes of assessing and evaluating students’ learning in the educational
setting (Magno, 2010). Practically, grading is the process of assigning value to
the performance or achievement of a learner based on specified criteria like
performance task, written test, major examinations, and homework. It is also a
form of evaluation which provides information as whether a learner passed or
failed in a certain task or subject. Thus, a student is given a grade of 85 after

scoring 36 in a 50-item midterm examination. He also received a passing

grade of 90 in Mathematics after his detailed grades in written test and
performance task were computed.
Models in Assessment
The two most common psychometric theories that serve as frameworks
for assessment and measurement especially in the determination of the
psychometric characteristics of a measure (e.g., tests, scale) are the
classical test theory (CTP) and the item response theory (IRT).
The CTT, also known as the true score theory, explains that variations
in the performance of examinees’ on a given measure is due to variations in
their abilities. It assumes that an examinees’ observed score in a given
measure is the sum of the examinees’ true scores and some degree of error
in the measurement caused by some internal and external conditions. Hence,
the CTT also assumes that all measures are imperfect and the scores
obtained from a measure could differ from the true score (i.e., true ability of an
examinee).
The CTT provides an estimation of the item difficulty based on the
frequency of number of examinees who correctly answer a particular item;
items with a fewer number of examinees with correct answers are considered
more difficult. It also provides an estimation of item discrimination based on
the number of examinees with higher or lower ability to answer a particular
item. If an item is able to distinguish between examinees with higher ability
(i.e., higher total test score) and lower ability (i.e., lower total test score), then
an item is considered to have good discrimination. Test reliability can also be
estimated using approaches from CTT (e.g., Kuder-Richardson 20,
Cronbach’s alpha). Item analysis based on this theory has been the dominant
approach because of the simplicity of calculating the statistics (e.g., item
difficulty index, item discrimination index, item-total correlation).
The IRT, on the other hand, analyzes test items by estimating the
probability that an examinee answers an item correctly or incorrectly. One of
the central differences of IRT from CTT is that in IRT, it is assumed that the
characteristic of an item can be estimated independently of the characteristic
or ability of an examinee, and vice-versa. Aside from item difficulty and item

discrimination indices, IRT analysis can provide significantly more information

on item and test, such as fit statistics, item characteristic curve (ICC), and
tests characteristic curve (TCC). There are also different IRT models (e.g.,
one-parameter model, 3-parameter model) which can provide different item
and test information that cannot be estimated using the CTT. In previous
years, there has been an increase in the use of IRT analysis as measurement
framework despite the complexity of the analysis involved due to the
availability of IRT software.
Types of Assessment
The most common types of assessment are diagnostic, formative and
summative, criterion-referenced and norm-referenced, traditional and
authentic. Other experts added ipsative and confirmative assessments.
Pre-assessment or diagnostic assessment
Before creating the instruction, it is necessary to know for what kind of
students you are creating the instruction. Your goal is to get to know your
student’s strengths, weaknesses and the skills and knowledge they
possess before taking the instruction. Based on the data you have
collected, you can create your instruction. Usually, a teacher conducts a
pre-test to diagnose the learners.
Formative assessment
Formative assessment is a continuous and several assessments done
during the instructional process for the purpose of improving teaching or
learning (Black & William, 2003).
Summative assessment
Summative assessments are quizzes, tests, exams, or other formal
evaluations of how much a student has learned throughout a subject. The
goal of this assessment is to get a grade that corresponds to a student’s
understanding of the class material as a whole, such as with a midterm or
cumulative final exam.
Confirmative assessment
When your instruction has
been implemented in your
classroom, it is still
necessary to take assessment. Your goal with confirmative assessments

is to find out if the instruction is still a success after a year, for example,
and if the way you are teaching is still on point. You could say that a
confirmative assessment is an extensive form of a summative
assessment (LMS, 2020).
Norm-referenced assessment
This assessment primarily compares one’s learning performance against
an average norm. It indicates the student’s performance in contrast with
other students (see Figure 5). Also, the age and question paper are same
for both of them. It assesses whether the students have performed better
or worse than the others. It is the theoretical average determined by
comparing scores.
Criterion-referenced assessment
It measures student’s
performances against a fixed
set of predetermined criteria or
learning standards (see Figure
6). It checks what students are
expected to know and be able
to do at a specific stage of
their education. Criterion-
referenced tests are used to
evaluate a specific body of
knowledge or skill set; it is a test to evaluate the curriculum taught in a
course. In practice, these assessments are designed to determine
whether students have mastered the material presented in a specific unit.
Each student’s performance is measured based on the subject matter
presented (what the student knows and what the student does not know).
Again, all students can get 100% if they have fully mastered the material.
Ipsative assessment

It measures the performance of a student against previous performances

from that student. With this method you are trying to improve yourself by
comparing previous results. You are not comparing yourself against other
students, which may be not so good for your self-confidence (LMS,
2020).
Traditional Assessment
Traditional assessments refer to conventional methods of testing, usually
matching type test items. In general, they measure students’ knowledge
of the content. Common examples are: True or False, multiple choice
tests, standardized tests, achievement tests, intelligence tests, and
aptitude tests.
Authentic Assessment
Authentic assessments refer to evaluative activities wherein students are
asked to perform real-world tasks that demonstrate meaningful
application of what they have learned. They measure students’ ability to
apply knowledge of the content in real life situations and ability to use
what they have learned in meaningful ways. Common examples are:
demonstrations, hands-on experiments, computer simulations, portfolios,
projects, multi-media presentations, role plays, recitals, stage plays and
exhibits.
Principles of Assessment
There are many principles in the assessment in learning. Different
literature provides their unique list yet closely related set of principles of
assessment. According to David et al. (2020), the following may be
considered as core principles in assessing learning:
1. Assessment should have a clear purpose. The methods used in
collecting information should be based on this purpose. The
interpretation of the data collected should be aligned with the purpose
that has been set. This principle is congruent with the outcome-based
education (OBE) principles of clarity of focus and design down.
2. Assessment is not an end in itself. It serves as a means to enhance
student learning. It is not a simple recording or documentation of what
learners know and do not know. Collecting information about student

learning, whether formative or summative, should lead to decision that

will allow improvement of the learners.
3. Assessment is an on-going, continuous, and a formative process . It
consists of a series of tasks and activities conducted over time. It is not
a one-shot activity and should be cumulative. Continuous feedback is
an important element of assessment. This principle is congruent with
the OBE principle of expanded opportunity.
4. Assessment is learner-centered. It is not about what the teacher does
but what the learner can do. Assessment of learners provides teachers
with an understanding on how they can improve their teaching, which
corresponds to the goal of improving student learning.
5. Assessment is both process- and product-oriented. It gives equal
importance to learner performance or product in the process. They
engaged in to perform or produce a product.
6. Assessment must be comprehensive and holistic. It should be
performed using a variety of strategies and tools designed to assess
student learning in a holistic way. It should be conducted in multiple
periods to assess learning overtime. This principle is also congruent
with the OBE principle of expanded opportunity.
7. Assessment requires the use of appropriate measures . For
assessment to be valid, the assessment tools or measures used must
have sound psychometric properties, including, but not limited to,
validity and reliability. Appropriate measures also mean that learners
must be provided with challenging but age- and context-appropriate
assessment tasks. This principle is consistent with the OBE principle of
high expectation.
8. Assessment should be authentic as possible. Assessment tasks or
activities should closely, if not fully, approximate real-life situations or
experiences. Authenticity of assessment can be taught of as a
continuum from least authentic to most authentic, with more authentic
tasks expected to be more meaningful for learners.
Summary

 Assessment is a systematic process of defining, selecting, designing,

collecting, analyzing, interpreting, and using information to increase
students' learning and development.
 Assessment may be described in terms of its purpose such as assessment
FOR, assessment OF and assessment AS.
 Learning is a change in the learner’s behaviour towards an improved level
as a product of one’s experience and interaction with his environment.
 Measurement is a process of determining or describing the attributes or
characteristics of learners generally in terms of quantity.
 Evaluation is the process of making judgments based on standards and
evidences derived from measurements.
 A test is a tool consists of a set of questions administered during a fixed
period of time under comparable conditions for all students. Testing
measures the level of skill or knowledge that has been reached.
 Grading is a form of evaluation which provides information as to whether a
learner passed or failed in a certain task or subject.
 The most common psychometric theories that serve as frameworks for
assessment and measurement in the determination of the psychometric
characteristics of a measure are the classical test theory (CTT) and the
item response theory (IRT).
 The most common types of assessment are diagnostic, formative and
summative, criterion-referenced and norm-referenced, traditional and
authentic. Other experts added ipsative and confirmative assessments.
 Principles of assessment are guides for teachers in their design, and
development of outcomes-based assessment tools.
Assessment
1. What is assessment in learning? What is assessment in learning for you?
2. Differentiate the following:
2.1. Measurement and evaluation
2.2. Testing and grading
2.3. Formative and summative assessment
2.4. Classical test theory and Item response theory

3. Based on the principles that you have learned, make a simple plan on how you will
undertake your assessment with your future students. Consider 2 principles only.
Principles Plan for applying the principle in your classroom
assessment
1.
2.
4. Choose 3 core principles in assessing learning, and explain them in

relation to your experiences with past teachers. A model is provided for
your reference.
Principles Practices
Example: One of my high school teachers was very unfair
1. Assessment requires when it comes to giving of assessment. I can still
the use of appropriate recall how he prepared his test questions that were
measures. not actually part of our lessons. Before the test, all
of us studied well on the various lessons we
discussed in the entire grading period.
Unfortunately, a lot of items in the actual
examinations that were out of the topics. What
made it worse is that he would get angry when
asked about the mismatch. I think the teacher did
not consider the validity of his test, and it was not
appropriate.
2.
3.
4.
5. Evaluate the extent of your knowledge and understanding about

assessment of learning and its principles.
Indicators Great Moderate Not at
extent extent all
1. I can explain the meaning of assessment of
learning
2. I can discuss what is learning.
3. I can compare assessment with
measurement and evaluation.
4. I can compare testing and grading.
5. I can discuss the classical test theory.
6. I can enumerate the different types of
assessment.
7. I can differentiate between formative and
summative assessment.
8. I can explain what each of the principles of
assessment means.
9. I can give examples of assessment tasks or

activities that do not conform to one or more

of the core principles in assessment.
10. I understand what it means to have a good
assessment practice in the classroom.
Enrichment
 Secure a copy of DepEd Order No. 8, s. 2015 on the Policy Guidelines on
Classroom Assessment for the K to 12 Basic Education Program. Study
the policies and be ready to clarify any provisions during G-class. You can
access the Order from this link: https://www.deped.gov.ph/2015/04/01/do-
8-s-2015-policy-guidelines-on-classroom-assessment-for-the-k-to-12-
basic-education-program/
 Read DepEd Order No. 5, s. 2013 (Policy Guidelines on the
Implementation of the School Readiness Year-end Assessment (SReYA)
for Kindergarten. (Please access through
https://www.deped.gov.ph/2013/01/25/do-5-s-2013-policy-guidelines-on-
the-implementation-of-the-school-readiness-year-end-assessment-sreya-
for-kindergarten/).
Questions
1. What assessment is cited in the Order? What is the purpose of giving
such assessment?
2. How would you classify the assessment in terms of its nature? Justify.
3. What is the relevance of this assessment to students, parents and
teachers and the school?
References

Alberta Education (2008, October 1). Types of Classroom Assessment.

Retrieved from
http://www.learnalberta.ca/content/mewa/html/assessment/types.html
David et al. (2020). Assessment in Learning 1. Manila: Rex Book Store.
Fisher, M. Jr. R. (2020). Student Assessment in Teaching and Learning.
Retrieved from https://cft.vanderbilt.edu/student-assessment-in-
teaching-and-learning/
Navarro, L., Santos, R. and Corpuz, B. (2017). Assessment of Learning 1 (3 rd
ed.). Quezon City: Lorimar Publishing, Inc.
Magno, C. (2010). The Functions of Grading Students. The Assessment
Handbook, 3, 50-58.
Lesson 2: Purposes of Classroom Assessment, Educational Objectives,

Learning Targets and Appropriate Methods
Pre-discussion
To be able to achieve the intended learning outcomes of this lesson,
one is required to understand the basic concepts, theories and principles in
assessing the learning of students. Should these things are not yet cleared
and understood, it is advised that a thorough review be made of the previous
chapter.
What to Expect?
1. articulate the purpose of classroom assessment;
2. tell the difference between the Bloom’s Taxonomy and the Revised;
Bloom’s Taxonomy in stating learning objectives;
3. apply the Revised Bloom’s Taxonomy in writing learning objectives;
4. discuss the importance of learning targets in instruction;
5. formulate learning targets; and
6. match the assessment methods with specific learning
objectives/targets.
Purpose of Classroom Assessment

Assessment works best when its purpose is clearly defined. Without a

clear purpose, it is difficult to plan and design assessment effectively and
efficiently. In classrooms, teachers are expected to understand the
instructional goals and learning outcomes which will inform how they will
design and implement their assessment. Generally, the purpose of
assessment may be classified in terms of the following:
1. Assessment for Learning (Formative Assessment)

The philosophy behind assessment for learning is that assessment and
teaching should be integrated into a whole. The power of such an
assessment doesn't come from intricate technology or from using a
specific assessment instrument. It comes from recognizing how much
learning is taking place in the common tasks of the school day – and how
much insight into student learning teachers can mine from this material
(McNamee and Chen, 2005: 76).
Assessment for learning is on-going assessment that allows teachers to
monitor students on a day-to-day basis and modify their teaching based on
what the students need to be successful. This assessment provides
students with the timely, specific feedback that they need to make
adjustments to their learning.
After teaching a lesson, we need to determine whether the lesson was
accessible to all students while still challenging to the more capable; what
the students learned and still need to know; how we can improve the
lesson to make it more effective; and, if necessary, what other lesson we
might offer as a better alternative. This continual evaluation of instructional
choices is at the heart of improving our teaching practice (Burns, 2005).
2. Assessment of Learning (Summative Assessment)

Assessment of learning is the snapshot in time that lets the teacher,
students and their parents know how well each student has completed the
learning tasks and activities. It provides information about student
achievement. While it provides useful reporting information, it often has
little effect on learning.

Comparing Assessment for Learning and Assessment of Learning
Assessment for Learning Assessment of Learning
(Formative Assessment) (Summative Assessment)
Checks learning to determine what to Checks what has been learned to
do next and then provides date.
suggestions of what to do - teaching
and learning are indistinguishable
from assessment.
Is designed to assist educators and Is designed for the information of
students in improving learning. those not directly involved in daily
learning and teaching (school
administration, parents, school board,
Alberta Education, post-secondary
institutions) in addition to educators
and students.
Is used continually by providing Is presented in a periodic report.
descriptive feedback.
Usually uses detailed, specific and Usually compiles data into a single
descriptive feedback - in a formal or number, score or mark as part of a
informal report. formal report.
Is not reported as part of an Is reported as part of an achievement
achievement grade. grade.
Usually focuses on improvement, Usually compares the student's
compared with the student's “previous learning either with other students'
best” (self-referenced, making learning (norm-referenced, making
learning more personal). learning highly competitive) or the
standard for a grade level (criterion-
referenced, making learning more
collaborative and individually focused).
Involves the student. Does not always involve the student.
Adapted from Ruth Sutton, unpublished document, 2001, in Alberta Assessment
Consortium, Refocus: Looking at Assessment for Learning (Edmonton, AB: Alberta
Assessment Consortium, 2003), p. 4.
3. Assessment as Learning (Self-assessment)

Assessment as learning develops and supports students'
metacognitive skills. This form of assessment is crucial in helping students
become lifelong learners. As students engage in peer and self-assessment,
they learn to make sense of information, relate it to prior knowledge and use it
for new learning. Students develop a sense of ownership and efficacy when
they use teacher, peer and self-assessment feedback to make adjustments,
improvements and changes to what they understand.

As discussed in the previous chapter, assessment serves as the

mechanism by which teachers are able to determine whether instruction
worked in facilitating the learning of students. Hence, it is very important that
assessment is aligned with instruction and the identified learning outcomes for
learners. Knowing what will be taught (curriculum content, competency, and
performance standards), and how it will be taught (instruction) are as
important as knowing what we want from the very start (curriculum outcome)
in determining the specific purpose and strategy for assessment. The
alignment is easier if teachers have clear purpose on why they are performing
the assessment. Typically, teachers use classroom assessment for
assessment OF learning more than assessment FOR learning and
assessment AS learning. Ideally, however, all three purposes of classroom
assessment must be used. While it is difficult to perform an assessment with
all three purposes in mind, teachers must be able to understand the three
purposes of assessment, including knowing when and how to use them.
The Roles of Classroom Assessment in the Teaching-Learning Process

Assessment is an integral part of the instructional process where
teachers design and conduct instruction (teaching), so learners achieve the
specific target learning outcomes defined by the curriculum. While the
purpose of assessment may be classified as assessment of learning,
assessment for learning, and assessment as learning, the specific purpose of
an assessment depends on the teacher’s objective in collecting and
evaluating assessment data from learners. More specific objectives for
assessing student learning congruent to the following roles of classroom
assessment in the teaching-learning process: formative, diagnostic,
evaluative, and motivational, each of which is discussed below.
Formative. Teachers conduct assessment because they want to acquire
information on the current status and level of learner’s knowledge and
skills or competencies. Teachers may need information (e.g. prior
knowledge, strengths) about the learners prior to instruction, so they can
design their instructional plan to better suit the needs of the learners.
Teachers may also need information on learners during instruction to
allow them to modify instruction or learning activities to help learners

achieve the learning outcomes. How teachers should facilitate students’

learning may be informed by the information that may be acquired in the
assessment results.
Diagnostic. Teachers can use assessment to identify specific learners’
weaknesses or difficulties that may affect their achievement of the
intended learning outcomes. Identifying these weaknesses allows
teachers to focus on specific learning needs and provide opportunities for
instructional intervention or remediation inside or outside the classroom.
The diagnostic role of assessment may also lead to differentiated
instruction or even individualized learning plans when deemed necessary.
Evaluative. Teachers conduct assessment to measure learners’ performance
or achievement for the purpose of making judgment or grading in
particular. Teachers need information on whether the learners have met
the intended learning outcomes after the instruction is fully implemented.
The learners’ placement or promotion to the next educational level is
informed by the assessment results.
Facilitative. Classroom assessment may affect student learning. On the part
of teachers, assessment for learning provides information on students’
learning and achievement that teachers can use to improve instruction
and the learning experiences of learners. On the part of learners,
assessment as learning allows them to monitor, evaluate, and improve
their own learning strategies. In both cases, student learning is facilitated.
Motivational. Classroom assessment can serve as a mechanism for learners
to be motivated and engaged in learning and achievement in the
classroom. Grades, for instance, can motivate and demotivate learners.
Focusing on progress, providing effective feedback, innovating
assessment tasks, and using scaffolding during assessment activities
provide opportunities for assessment activities provide opportunities for
assessment to be motivating rather than demotivating.
Comparing Educational Goals, Standards, and Objectives

Before discussing what learning targets are, it is important to first
define educational goals, standards, and objectives.

Goals. Goals are general statements about desired learner outcomes in a

given year or during the duration of a program (e.g. senior high school).
Standards. Standards are specific statements about what learners should
know and are capable of doing at a particular grade level, subject, or
course. McMillan (2014) described four different types of educational
standards: (1) content (desired outcomes in a content area), (2)
performance (what students to do demonstrate competence), (3)
developmental (sequence of growth and change over time), and (4)
grade-level (outcomes for a specific grade).
Educational Objectives. Educational or learning objectives are specific
statements of learner performance at the end of an instruction unit. These
are sometimes referred to as behavioural objectives and are typically
stated with the use of verbs. The most popular taxonomy of educational
objectives is Bloom’s Taxonomy of Educational Objectives.
The Bloom’s Taxonomy of Educational Objectives
Bloom’s Taxonomy consists of three domains: cognitive, affective and
psychomotor. These three domains correspond to the three types of goals
that teachers want to assess: knowledge-based goals (cognitive), skills-based
goals (psychomotor), and effective goals (affective). Hence, there are there
taxonomies that can be used by teachers depending on the goals. Each
taxonomy consists of different levels of expertise with varying degrees of
complexity. The most popular among the three taxonomies is the Bloom’s
Taxonomy of Educational Objectives for Knowledge-Based Goals. The
taxonomy describes six levels of expertise: knowledge, comprehension,
application, analysis, synthesis, and evaluation. Table 1 presents the
description, illustrative verbs, and a sample objective for each of the six
levels.
Table 1. Bloom’s Taxonomy of Educational
Objectives in the Cognitive Domain
Cognitive Level Description Illustrative Sample Objective

Verbs
Knowledge Recall or defines, Enumerate the six
recognition of recalls, levels of expertise
learned materials names, in the Bloom’s
like concepts, enumerates, taxonomy of

events, facts, ideas, and labels objectives in the

and procedures cognitive domain
Comprehension Understanding the explains, Explain each of the
meaning of a describes, six levels of
learned material, summarizes, expertise in the
including and translates Bloom’s taxonomy
interpretation, and of objectives in the
literal translation cognitive domain.
Application Use of abstract applies, Demonstrate how
ideas, principles, or demonstrates, to use Bloom’s
methods to specific produces, taxonomy in
concrete situations illustrates, formulating learning
and uses objectives.
Analysis Separation of a compares, Compare and
concept or idea into contrasts, contrast the six
constituent parts or categorizes, levels of expertise
elements and an classifies, and in Bloom’s
understanding of calculates taxonomy of
the nature and objectives in the
association among cognitive domain.
the elements
Synthesis Construction of composes Compose learning
elements or parts constructs, targets using
from different creates, Bloom’s taxonomy.
sources to form a designs, and
more complex or integrates
novel structure
Evaluation Making judgment of appraises, Evaluate the
ideas or methods evaluates, congruence
based on sound and judges, between learning
established criteria concludes, targets and
and criticizes assessment
methods.
Bloom’s taxonomies of educational objectives provide teachers with a

structured guide in formulating more specific learning targets as they provide
an exhaustive list of learning objectives. The taxonomies do not only serve as
guide for teachers’ instruction but also as a guide for teachers’ instruction but
also as a guide for their assessment of student learning in the classroom.
Thus, it is imperative that teachers identify the levels of expertise that they
expect the learners to achieve and demonstrate. This will then inform the
assessment method required to properly assess student learning. It is
assumed that a higher level of expertise in a given domain requires more
sophisticated assessment methods or strategies.

The Revised Bloom’s Taxonomy of Educational Objectives

Anderson and Krathwohl (2001) proposed a revision of the Bloom’s
Taxonomy in the cognitive domain by introducing a two-dimensional model for
writing learning objectives. The first dimension, knowledge dimension,
includes four types: factual, conceptual, procedural, and metacognitive. The
second dimension, cognitive process dimension, consists of six types:
remember, understand, apply, evaluate, and create. An educational or
learning objective formulated from this two-dimensional model contains a
noun (type of knowledge) and a verb (type of cognitive process). The Revised
Bloom’s Taxonomy provides teachers with a more structured and more
precise approach in designing and assessing learning objectives.
Below is an example of an educational or learning objective:
Students will be able to differentiate qualitative research and
quantitative research.
In the example, differentiate is the verb that represents the type of
cognitive process (in this case, analyze), while qualitative research and
quantitative research is the noun phrase that represents the type of
knowledge (in this case, conceptual).
Tables 2 and 3 present the definition, illustrative verbs, and sample
objectives of the cognitive process dimensions and knowledge dimensions of
the Revised Bloom’s Taxonomy.
Table 2. Cognitive Process Dimensions in the Revised
Bloom’s Taxonomy of Educational Objectives
Cognitive Definition Illustrative Verbs Sample Objective

Process
Create Combining compose, produce, Propose a program
parts to make a develop, formulate, of action to help
whole devise, prepare, solve Metro Manila’s
design, construct, traffic congestion.
propose, and re-
organize
Evaluate Judging the assess, measure, Critique the latest
value of estimate, evaluate, film that you have
information or critique, and judge watched. Use the
data critique guidelines
and format
discussed in the

class.
Analyze Breaking down analyze, calculate, Classify the following
information into examine, test, chemical elements
parts compare, differentiate, based on some
organize, and classify categories/areas.
Apply Applying the apply, employ, Solve the following
facts, rules, practice, relate, use, problems using the
concepts and implement, carry-out, different measures of
ideas in and solve central tendency.
another context
Understand Understanding describe, determine, Explain the causes
what the interpret, translate, of malnutrition in the
information paraphrase, and country.
means explain
Remember Recognizing identifying, list, name, Name the 7th
and recalling underline, recall, president of the
facts retrieve, locate Philippines
Table 3. Knowledge Dimensions in the Revised Bloom’s

Taxonomy of Educational Objectives
Knowledge Description Sample Question

Factual This type of knowledge is basic in every Who is the
discipline. It tells the facts or bits of national hero of
information one needs to know in a the Philippines?
discipline. This type of knowledge usually
answers questions that begin with “who”,
“where”, “what”, and “when”.
Conceptual This type of knowledge is also What makes the
fundamental in every discipline. It tells the Philippines the
concepts, generalizations, principles, “Pearl of the
theories, and models that one needs to orient seas”?
know in a discipline. This type of
knowledge usually answers questions
that begin with “what”.
Procedural This type of knowledge is also How to open a
fundamental in every discipline. It tells the new file in
processes, steps, techniques, Microsoft Word?
methodologies, or specific skills needed
in performing a specific task that one
needs to know and be able to do in a
discipline. This type of knowledge usually
answers questions that begin with “how”.
Metacognitive This type of knowledge makes the Why is Education
discipline relevant to one’s life. It makes the most suitable
one understand the value of learning on course for you?
one’s life. It requires reflective knowledge
and strategies on how to solve problems
or perform a cognitive task through

understanding of oneself and context.

This type of knowledge usually answers
questions that begin with “why”.
Questions that begin “how” and what
could be used if they are embedded in a
situation that one experiences in real life.
LEARNING TARGETS
“Students who can identify what they are learning significantly outscore
those who cannot.” – Robert Marzano
The metaphor that Connie Moss and Susan Brookhart use to describe
learning targets in their Educational Leadership article, “What Students Need
to Learn,” is that of a global positioning system (GPS). Much like a GPS
communicates timely information about where you are, how far and how long
until your destination, and what to do when you make a wrong turn. A learning
target provides a precise description of the learning destination. They tell
students what they will learn, how deeply they will learn it, and how they will
demonstrate their learning.
Learning targets describe in student-friendly language the learning to
occur in the day’s lesson. Learning targets are written from the students’ point
of view and represent what both the teacher and the students are aiming for
during the lesson. Learning targets also include a performance of
understanding, or learning experience, that provides evidence to answer the
question “What do students understand and what are they able to do?”
As Moss and Brookhart write, while a learning target is for a daily
lesson, “Most complex understandings require teachers to scaffold student
understanding across a series of interrelated lessons.” In other words, each
learning target is a part of a longer, sequential plan that includes short and
long-term goals.
McMillan (2014) defined learning targets as a statement of student
performance for a relatively restricted type of learning outcome that will be
achieved in a single lesson or a few days, and contains what students should
know, understand and be able to do at the end of the instruction and criteria
for judging the level of demonstrated performance. It is more specific and
clear than the educational goals, standards, and learning objectives. To avoid

confusion of terms, De Guzman and Adamos (2015) wrote that definition of

learning targets is similar to that of learning outcomes.
Now, how does a learning target differ from an instructional objective?
An instructional objective describes an intended outcome and the nature of
evidence that will determine mastery of that outcome from a teacher’s point of
view. It contains content outcomes, conditions, and criteria. A learning target,
on the other hand, describes the intended lesson-sized learning outcome and
the nature of evidence that will determine mastery of that outcome from a
student’s point of view. It contains the immediate learning aims for today’s
lesson (ASCD, 2021).
Why Use Learning Targets?

According to experts, one of the most powerful formative strategies for
improving student learning is clear learning targets for students. In Visible
Learning, John Hattie emphasizes the importance of “clearly communicating
the intentions of the lessons and the criteria for success. Teachers need to
know the goals and success criteria of their lessons, know how well all
students in their class are progressing, and know where to go next.”
Learning targets ensure that students:
 know what they are supposed to learn during the lesson; without
a clear learning target, students are left guessing what they are
expected to learn and what their teacher will accept as evidence
of success.
 build skilfulness in their ability to assess themselves and be
reflective.
 are continually monitoring their progress toward the learning goal
and making changes as necessary to achieve their goal.
 are in control of their own learning, and not only know where they
are going, they know exactly where they are relative to where
they are going; they are able to choose strategies to help them do
their best, and they know exactly what it takes to be successful.

 know the essential information to be learned and how they will

demonstrate that learning to achieve mastery.
Learning targets are a part of a cycle that includes student goal
setting and teacher feedback. Formative assessment, assessment for
learning, starts when the teacher communicates the learning target at
the beginning of the lesson. Providing examples of what is expected
along with the target written in student-friendly language gives students
the opportunity to set goals, self-assess, and make improvements.
Types of Learning Targets

Many experts consider four (4) types of learning targets, namely:
knowledge, skills, reasoning and product. Table 4 provides the details of each
category.
Table 4. Types of Learning Targets, Description and Sample
Types Description Sample
Knowledge Knowledge targets  I can explain the role of
 Know, list, represent the factual conceptual framework
identify, information, procedural in a research.
knowledge, and  I can identify
understand,
conceptual metaphors and similes
explain understandings that  I can read and write
underpin each discipline quadratic equations.
or content area. These  I can describe the
targets form the function of a cell
foundation for each of the membrane.
other types of learning  I can explain the effects
targets. of an acid on a base.
Skills Skill targets are those  I can facilitate a focus
 Demonstrate, where a demonstration or group discussion (FGD)
pronounce, a physical skill-based with research
performance is at the participants.
perform
heart of the learning.  I can measure mass in
Most skill targets are metric and SI units.
found in subjects such as  I can use simple
physical education, visual equipment and tools to
and performing arts, and gather data.
foreign languages. Other  I can read aloud with
content areas may have fluency and expression.
a few skill targets.  I can participate in civic
discussions with the
aim of solving current
problems.

 I can dribble to keep

the ball away from an
opponent.
Reasoning Reasoning targets  I can justify my
 Predict, infer, specify thought research problems with
summarize, processes students must a theory.
learn to do well across a  I can use statistical
compare,
range of subjects. methods to describe,
analyze, classify Reasoning analyze, evaluate, and
Involves thinking and make decisions.
applying-using  I can make a prediction
knowledge to solve a based on evidence.
problem, make a  I can examine
decision, etc. These data/results and
targets move students propose a meaningful
beyond mastering interpretation.
content knowledge to the  I can distinguish
application of knowledge. between historical fact
and opinion.
Product Product targets describe  I can write a thesis
 Create, design, learning in terms of proposal.
write, draw, artifacts where creation of  I can construct a bar
a product is the focus of graph.
make
the learning target. With  I can develop a
product targets, the personal health-related
specifications for quality fitness plan.
of the product itself are  I can construct a
the focus of teaching and physical model of an
assessment. object.
Other experts consider a fifth type of learning target – affect. This

refers to affective characteristics that students can develop and demonstrate
because of instruction. This includes the attitudes, beliefs, interests, and
values. Some experts use disposition as alternative term for affect.
Types of Assessment Methods

Assessment methods can be categorized according to the nature and
characteristics of each method. McMillan (2007) identified 4 major categories
such as selective-response, constructed-response, teacher observations and
student self-assessment.
Selected-Response vs. Constructed-Response

An assessment, test, or exam is classified as selected-response or

constructed-response based on the item types used.
An exam using multiple-choice, true/false, matching, or any
combination of these item types is called a selected-response assessment
because the student “selects” the correct answer from available answer
choices. A selected-response exam is considered to be an objective exam
because there is no rating of the student’s answer choice – it is either correct
or incorrect.
 Multiple-Choice Test Items have a stem that poses the problem or
question and three or four answer choices (options). One of the choices is
the undeniably correct answer, and the other options are, unquestionably,
incorrect answers.
 Matching items are somewhat like MC items in that there are item stems
(phrases or statements) and answer choices that are required to be
matched to the item stems. There should always be one more answer
choice than the number of item stems. Generally, matching items are well
suited for testing understanding of concepts and principles.
 True-false items have the advantage of being easy to write, more can be
given in the same amount of time compared to MC items, reading time is
minimized, and they are easy to score.
Constructed-response items require the student to answer a
question, commonly referred to as a “prompt.” A constructed-response exam
is considered to be a subjective exam because the correctness of the answer
is based on a rater’s opinion, typically with the use of a rubric scale to guide
the scoring. Essay and short answer exams are constructed-response
assessments because the student has to “construct” the answer.
Comparison between Selected-Response and Constructed-Response

Types Selected-response Constructed-response
(e.g., multiple choice, true (e.g., short answer, essay)
or false, matching type)
Advantages  Easier to score  Allows students to
 Can be answered quickly demonstrate complex, in-
 Covers a broader range of depth understanding
curriculum in a shorter  Less likelihood of guessing

time correct answer

 Motivates students to learn
in a way that stresses the
organization of information,
principles, and application
Disadvantages  Constrains students to  More time-consuming to
single appropriate answer score
 Encourages students to  More time-consuming to an
learn by recognition
 Subject to guessing
correct answer
Teachers Observation
Teacher observation has been accepted readily in the past as a
legitimate source of information for recording and reporting student
demonstrations of learning outcomes. As the student progresses to later
years of schooling, less and less attention typically is given to teacher
observation and more and more attention typically is given to formal
assessment procedures involving required tests and tasks taken under explicit
constraints of context and time. However, teacher observation is capable of
providing substantial information on student demonstration of learning
outcomes at all levels of education.
For teacher observation to contribute to valid judgments concerning
student learning outcomes, evidence needs to be gathered and recorded
systematically. Systematic gathering and recording of evidence requires
preparation and foresight. Teacher observation can be characterised as two
types: incidental and planned.
 Incidental observation occurs during the ongoing (deliberate) activities of
teaching and learning and the interactions between teacher and students.
In other words, an unplanned opportunity emerges, in the context of
classroom activities, where the teacher observes some aspect of
individual student learning. Whether incidental observation can be used
as a basis for formal assessment and reporting may depend on the
records that are kept.
 Planned observation involves deliberate planning of an opportunity for the
teacher to observe specific learning outcomes. This planned opportunity

may occur in the context of regular classroom activities or may occur

through the setting of an assessment task (such as a practical or
performance activity).
Student Self-Assessment
One form of formative assessment is self-assessment or self-reflection
by students. Self-reflection is the evaluation or judgment of the worth of one’s
performance and the identification of one’s strengths and weaknesses with a
view to improving one’s learning outcomes, or more succinctly, reflecting on
and monitoring one’s own work processes and/or products (Klenowski, 1995).
Student self-assessment has long been encouraged as an educational and
learning strategy in the classroom, and is both popular and positively
regarded by the general education community (Andrade, 2010).
Besides, McMillan and Hearn (2008) described self-assessment as a
process by which students 1) monitor and evaluate the quality of their thinking
and behavior when learning and 2) identify strategies that improve their
understanding and skills. That is, self-assessment occurs when students
judge their own work to improve performance as they identify discrepancies
between current and desired performance. This aspect of self-assessment
aligns closely with standards-based education, which provides clear targets
and criteria that can facilitate student self-assessment. The pervasiveness of
standards-based instruction provides an ideal context in which these clear-cut
benchmarks for performance and criteria for evaluating student products,
when internalized by students, provide the knowledge needed for self-
assessment. Finally, self-assessment identifies further learning targets and
instructional strategies (correctives) students can apply to improve
achievement.
Appropriate Methods of Assessment

Once the learning targets are identified, appropriate assessment
methods can be selected to measure student learning. The match between a
learning target and the assessment method used to measure if students have
met the target is very critical. Tables 5 and 6 present a matrix of the different

types of learning targets and sample assessment methods. Details of these

varied assessment methods shall be discussed thoroughly in Chapter 5.
Table 5. Matching Learning Targets and Assessment Methods

Selected Response Constructed Response
Learning
Multiple True or Matching Short Problem Essay
Targets
Choice False Type Answer -solving
Knowledge 3 3 3 3 3 3
Reasoning 2 1 1 1 3 3
Skill 1 1 1 1 2 2
Product 1 1 1 1 1 1
Note: Higher numbers indicate better matches (e.g., 5 = Excellent, 1 = Poor).
Table 6. Matching Learning Targets with other Types of Assessment

Learning Project-based Portfolio Recitation Observation
Targets
Knowledge 1 3 3 2
Reasoning 2 2 3 2
Skill 2 3 1 2
Product 3 3 1 1
Note: Higher numbers indicate better matches (e.g., 5 = Excellent, 1 = Poor).
There are still other types of assessment, and it is up to the teachers to
select the method of assessment and design appropriate assessment tasks
and activities to measure the identified learning targets.
Summary
 In educational setting, the purpose of assessment may be classified in
terms of assessment of learning, assessment for learning, and
assessment as learning.
 Assessment OF learning is held at the end of a subject or a course to
determine performance. It is equivalent to summative assessment.
 Assessment FOR learning is done repeatedly during instruction to check
the learners’ progress and teacher’s strategies so that intervention or
changes can be made.
 Assessment AS learning is done to develop the learners’ independence
and self-regulation.

 Classroom assessment in the teaching-learning process has the following

roles: formative, diagnostic, evaluative, and motivational.
 Educational objectives are best explained through Bloom’s Taxonomy. It
consists of three (3) domains, namely: cognitive, affective and
psychomotor which are the main goals of teachers.
 An instructional objectives guide instruction, and we write them from the
teacher’s point of view. Learning targets guide learning and are expressed
in language that students understand, the lesson-sized portion of
information, skills, and reasoning processes that students will come to
know deeply.
 Assessment methods may be categorized as selected-response,
constructed-response, teacher observation and student self-assessment.
 Learning targets may be knowledge, skills, reasoning or product.
 Teachers match learning targets with appropriate assessment methods.
Assessment
1. Describe the 3 purposes of classroom assessment by completing the
matrix below.
Assessment OF Assessment Assessment AS
learning FOR learning learning
WHAT?
WHY?
WHEN?
2. Compare and contrast the different roles of classroom assessment.

3. Distinguish educational goals, standards, objectives and learning targets
using the following table.
Goals Standards Objectives Learning
targets
Description
Sample
statements

4. Learning targets are similar to learning outcomes. Justify.

5. Determine whether the given learning target is knowledge, skill, reasoning
or product.
Learning Targets Type

1. I can use data from a random sample to draw inferences R
about a population with an unknown characteristic of
interest.
2. I can identify the major reasons for the rapid expansion of K
Islam during the 7th and 8th centuries.
3. I can describe the relationship between illustrations and R
the story in which they appear.
4. I can describe how organisms interact with each other to R
transfer energy and matter in an ecosystem.
5. I can recall the influences that promote alcohol, tobacco, K
and other drug use.
6. I can use characteristic properties of liquids to distinguish R
one substance from another.
7. I can evaluate the quality of my own work to refine it. R
8. I can identify the main idea of a passage. K
9. I can dribble the basketball with one hand. S
10. I can list down the first 5 Philippine Presidents. K
11. I can construct a bar graph. P
12. I can develop a personal health-related fitness plan. P
13. I can measure the length of an object. P
14. I can introduce myself in Chinese. SS
15. I can compare forms of government. RS
6. Check the DepEd’s K to 12 Curriculum Guide at this link:

https://www.deped.gov.ph/k-to-12/about/k-to-12-basic-education-
curriculum/grade-1-to-10-subjects/, and select a single lesson that interest
you. Complete a learning target activity below based on the given model:
Title of Lesson: Writing the Literature Review of a Thesis Proposal

Instructional Lesson Content Type of Sample Learning
Objective/learning Learning Targets
outcome Targets
At the end of the Writing the I can…
lesson, the Literature Review
students should
be able to  Research Knowledge explain the principles
demonstrate their Literature and in writing the literature
ability to write a Research Gap review of a thesis

literature review proposal

section of a thesis  Performing the Reasoning argue the significance
proposal. Literature of my thesis through
Search and literature review
Reviewing the
Literature
 Principles and Skills search and organize

Guidelines in related literature from
Writing the various sources
Literature
Review
 APA Guidelines Product write an effective

in Citations and review section of a
References thesis proposal
Title of Lesson: __________________________________

Instructional Lesson Content Type of Sample Learning
Objective/learning Learning Targets
objectives Targets
7. Evaluate the extent of your knowledge and understanding about the

purposes of assessment, learning targets, and appropriate assessment
methods.
Indicators Great Moderate Not at

extent extent all
1. I can enumerate the different purposes of
assessment.
2. I can explain the role of assessment in
the teaching and learning process.
3. I can explain the purpose of conducting
classroom assessment.
4. I can differentiate between goals,
standards, objectives, and learning
targets.

5. I can explain the different levels of

expertise in Bloom’s Taxonomy of
Educational Objectives in the Cognitive
domain.
6. I can explain the difference between the
Bloom’s Taxonomy and the Revised
Bloom’s Taxonomy.
7. I can compare and contrast instructional
objectives and learning targets.
8. I can formulate specific learning target
given in a specific lesson.
9. I can match assessment method
appropriate to specific learning targets.
10. I can select or design an assessment task
or activity to measure a specific learning
target.
Enrichment
 Open the DepEd’s K to 12 Curriculum Guide from this link:
https://www.deped.gov.ph/k-to-12/about/k-to-12-basic-education-
curriculum/grade-1-to-10-subjects/. and make yourself familiar with the
content standards, performance standards and competency.
 Choose a specific lesson for a subject area, and grade level that you want
to teach in the future. Prepare an assessment plan using the matrix.
Subject
Grade level
Grade level standard
Performance standards
Specific lesson
Learning targets
Assessment
task/activity
Why use of this

assessment task
activity?
How does this

task/activity help you

improve your
instruction?
How does this

assessment task/activity
help your learners
achieved the intended
learning outcomes?
References
Andrade, H. (2010). Students as the definitive source of formative
assessment: Academic self-assessment and the self-regulation of
learning. In H. Andrade & G. Cizek (Eds.), Handbook of formative
assessment (pp. 90–105). New York, NY: Routledge.
Clayton, Heather. “Power Standards: Focusing on the Essential.” Making the
Standards Come Alive! Alexandria, VA: Just ASK Publications, 2016.
Access at www.justaskpublications.com/just-ask-resource-center/e-
newsletters/msca/power-standards/
EL Education (2020). Students Unpack a Learning Target and Discuss
Academic Vocabulary. [Video]. https://vimeo.com/44052219
Hattie, John. Visible Learning for Teachers: Maximizing Impact on Learning.
New York: Routledge, 2012.
Klenowski, V. (1995). Student self-evaluation processes in student-centred
teaching and learning contexts of Australia and England. Assessment
in Education: Principles, Policy & Practice, 2(2).
Maxwell, Graham S. (2001). Teacher Observation in Student Assessment.
(Discussion Paper). The University of Queensland.
Moss, Connie and Susan Brookhart. Learning Targets: Helping Students Aim
for Understanding in Today’s Lesson. Alexandria: ASCD, 2012.

Lesson 3: Different Classifications of Assessment
Pre-discussion
Ask the students about their experiences when they took the National
Achievement Test (NAT) during their elementary and high school days. Who
administered it? How did you answer them? What do you think was the
purpose of the NAT? What about their experiences in taking quarterly tests or
quizzes? What other assessments or tests did they take before? What are
your notable experiences relative to taking tests?
What to Expect?
1. compare the following forms of assessment: educational vs.
psychological, teacher-made vs. standardized, selected-response vs.
constructed-response, achievement vs. aptitude, and power vs. speed;
2. give examples of each classification of test;
3. illustrate situations on the use of different classifications of
assessment; and
4. decide on the kind of assessment to be used.
Classifications of Assessment
The different forms of assessment are classified according to purpose, form,
interpretation of learning, function ability, and kind of learning.
Classification Type
 Purpose Educational and Psychological
 Form Paper and pencil, and Performance-based
 Function Teacher-made and Standardized

 Kind of learning Achievement and Aptitude

 Ability Speed and Power
 Interpretation of Norm-referenced and Criterion-referenced
learning
Educational and Psychological Assessment

Educational assessment is the process of measuring and documenting
what students have learned in their educational environments. In a traditional
classroom setting, it focuses on identifying the knowledge, skills, and attitudes
students have acquired via a lesson, a course, a grade level, and so on. It is
an ongoing process, ranging from the activities that teachers do with students
in classrooms every day to standardized testing, college theses and
instruments that measure the success of corporate training programs.
Let’s understand educational assessments by looking at its many
aspects:
 The forms of educational assessment can take
 The need for educational assessment
 The essentials of a good assessment
 Types of educational assessment
Education assessments can take any form:
 It may involve formal tests or performance-based activities.
 It may be administered online or using paper and pencil or other
materials.
 It may be objective (requiring a single correct answer) or subjective
(there may be many possible correct answers, such as in an essay).
 It may be formative (carried out over the course of a project) or
summative (administered at the end of a project or a course).
What these types of educational assessments have in common is that,
all of them measure the learners’ performance relative to previously defined
goals, which are usually stated as learning objectives or outcomes. And,
because assessment is so widespread, it is vital that educators, as well as
parents and students, understand what it is and why it is used.
Psychological assessment is the use of standardized measures to
evaluate the abilities, behaviors, and personal qualities of people. Typically,

psychological tests attempt to shed light on an individual’s intelligence,

personality, motivation, interest, psychopathology, or ability. Traditionally,
these tests were formed on clinical or psychiatric populations and were used
primarily for diagnosis and treatment. However, with the increasing presence
of forensic psychologists in the courtroom, these tests are being used to help
determine legal questions or legal constructs. As a result, there is a growing
debate over the utility of these tests in the courtroom.
Paper-pencil and Performance-based Assessments
Paper-and-pencil instruments refer to a general group of assessment
tools in which students read questions and respond in writing. This includes
tests, such as knowledge and ability tests, and inventories, such as
personality and interest inventories. It can be used to assess job-related
knowledge and ability or skill qualifications. The possible range of
qualifications which can be assessed using paper-and-pencil tests is quite
broad. For example, such tests can assess anything from knowledge of office
procedures to knowledge of federal legislation, and from the ability to follow
directions to the ability to solve numerical problems. Because many takers
can be assessed at the same time with a paper-and-pencil test, such tests are
an efficient method of assessment.
All assessment methods must provide information that is relevant to
the qualification(s) being assessed. There are four (4) steps in developing
paper-and-pencil tests, namely: listing topic areas/tasks; specifying the
response format, number of questions, the time limit and difficulty level;
writing the questions and developing the scoring guide; and reviewing the
questions and scoring guide.
Step 1. Listing topic areas/tasks

For each knowledge/ability qualification that will be assessed by the test, list
the topic areas/tasks to be covered. Check off any critical topic areas/tasks
that are particularly important to the job.
For example, the topic areas that will be covered for the qualification
(knowledge of office procedures) might be knowledge of
correspondence, knowledge of filing and knowledge of making travel
arrangements. Or, for example, the tasks to be assessed for the

qualification (ability to solve numerical problems) might be the ability to

add, subtract, multiply and divide.
Step 2. Specifying the response format, number of questions, the time
limit and difficulty level
Prior to writing the questions for your test, you should decide on such things
as the response format, the number of questions, the time limit and the
difficulty level.
What type of response format should I choose?
The three most common response formats are:
(a) multiple-choice;
(b) short answer; and
(c) essay.
 With a multiple-choice response format, a large number of different topic
areas/tasks can be covered within the same test and the questions are
easy to score. However, because all potential answers must be chosen by
some candidates, it is time-consuming to write good questions.
 With a short-answer response format, as in multiple choice, a large
number of different topic areas/tasks can be covered within the same test
and these questions are easy to score. In addition, less time is required to
write these questions compared to multiple-choice ones.
 With an essay response format, only a few topic areas/tasks can be
covered due to the amount of time it takes to answer questions; however,
the content can be covered in greater detail. Essay questions require little
time to write but they are very time-consuming to score.
 Although at first glance a multiple-choice format may seem a relatively
easy and logical choice if breadth of coverage is emphasized, don't be
fooled. It is hard to write good multiple-choice questions and you should
only choose this type of response format if you are willing to devote a lot of
time to editing, reviewing, and revising the questions. If depth of coverage
is emphasized, use an essay response format.
Performance-based Assessment

Performance assessment is one alternative to traditional methods of

testing student achievement. While traditional testing requires students to
answer questions correctly, performance assessment requires students to
demonstrate knowledge and skills, including the process by which they solve
problems. Performance assessments measure skills such as the ability to
integrate knowledge across disciplines, contribute to the work of a group, and
develop a plan of action when confronted with a new situation. Performance
assessments are also appropriate for determining if students are achieving
the higher standards set by states for all students. This brochure explains
features of this assessment alternative, suggests ways to evaluate it, and
offers exploratory questions you might ask your child's teacher about this
subject.
What Are Performance Assessments?

The goal of performance-based learning should be to enhance what
the students have learned, not just have them recall facts.
The following six (6) types of activities provide good starting points for
assessments in performance-based learning.
1. Presentations
One easy way to have students complete a performance-based activity
is to have them do a presentation or report of some kind. This activity could
be done by students, which takes time, or in collaborative groups.
The basis for the presentation may be one of the following:
 Providing information
 Teaching a skill
 Reporting progress
 Persuading others
Students may choose to add in visual aids or a PowerPoint
presentation or Google Slides to help illustrate elements in their speech.
Presentations work well across the curriculum as long as there is a clear set
of expectations for students to work with from the beginning.
2. Portfolios

Student portfolios can include items that students have created and
collected over a period. Art portfolios are for students who want to apply to art
programs in college. Another example is when students create a portfolio of
their written work that shows how they have progressed from the beginning to
the end of class. The writing in a portfolio can be from any discipline or a
combination of disciplines.
Some teachers have students select those items they feel represents
their best work to be included in a portfolio. The benefit of an activity like this
is that it is something that grows over time and is therefore not just completed
and forgotten. A portfolio can provide students with a lasting selection of
artefacts that they can use later in their academic career.
Reflections may be included in student portfolios in which students may
make a note of their growth based on the materials in the portfolio.
3. Performances
Dramatic performances are one kind of collaborative activities that can
be used as a performance-based assessment. Students can create, perform,
and/or provide a critical response. Examples include dance, recital, dramatic
enactment. There may be prose or poetry interpretation.
This form of performance-based assessment can take time, so there
must be a clear pacing guide. Students must be provided time to address the
demands of the activity; resources must be readily available and meet all
safety standards. Students should have opportunities to draft stage work and
practice.
Developing the criteria and the rubric and sharing these with students
before evaluating a dramatic performance is critical.
4. Projects
Projects are commonly used by teachers as performance-based
activities. They can include everything from research papers to artistic
representations of information learned. Projects may require students to apply
their knowledge and skills while completing the assigned task. They can be
aligned with the higher levels of creativity, analysis, and synthesis.
Students might be asked to complete reports, diagrams, and maps.
Teachers can also choose to have students work individually or in groups.

Journals may be part of a performance-based assessment. They can be used

to record student reflections. Teachers may require students to complete
journal entries. Some teachers may use journals as a way to record
participation.
5. Exhibits and Fairs

Teachers can expand the idea of performance-based activities by
creating exhibits or fairs for students to display their work. Examples include
things like history fairs to art exhibitions. Students work on a product or item
that will be exhibited publicly.
Exhibitions show in-depth learning and may include feedback from
viewers. In some cases, students might be required to explain or defend their
work to those attending the exhibition. Some fairs like science fairs could
include the possibility of prizes and awards.
6. Debates
A debate in the classroom is one form of performance-based learning
that teaches students about varied viewpoints and opinions. Skills associated
with debate include research, media and argument literacy, reading
comprehension, evidence evaluation, public speaking, and civic skills.
Teacher-made and Standardized Tests

Carefully constructed teacher-made tests and standardised tests are
similar in many ways. Both are constructed on the basis of carefully planned
table of specifications, both have the same type of test items, and both
provide clear directions to the students. Still the two differ. They differ in the
quality of test items, the reliability of test measures, the procedures for
administering and scoring and the interpretation of scores. No doubt,
standardised tests are good and better in quality, more reliable and valid.
But a classroom teacher cannot always depend on standardised tests.
These may not suit to his local needs, may not be readily available, may be
costly, and may have different objectives. In order to fulfill the immediate
requirements, the teacher has to prepare his own tests which are usually
objective type in nature.

What is a Teacher-made Test?

Teacher-made tests are normally prepared and administered for testing
classroom achievement of students, evaluating the method of teaching
adopted by the teacher and other curricular programmes of the school.
Teacher-made test is one of the most valuable instruments in the hands of the
teacher to solve his purpose. It is designed to solve the problem or
requirements of the class for which it is prepared.
It is prepared to measure the outcomes and content of local curriculum.
It is very much flexible so that, it can be adopted to any procedure and
material. It does not require any sophisticated technique for preparation.
Taylor has highly recommended for the use of these teacher-made objective
type tests, which do not require all the four steps of standardised tests nor
need the rigorous processes of standardisation. Only the first two steps
planning and preparation are sufficient for their construction.
Features of Teacher-Made Tests

1. The items of the tests are arranged in order of difficulty.
2. These are prepared by the teachers which can be used for prognosis
and diagnosis purposes.
3. The test covers the whole content area and includes a large number of
items.
4. The preparation of the items conforms to the blueprint.
5. Test construction is not a single man’s business, rather it is a co-
operative endeavour.
6. A teacher-made test does not cover all the steps of a standardised test.
7. Teacher-made tests may also be employed as a tool for formative
evaluation.
8. Preparation and administration of these tests are economical.
9. The test is developed by the teacher to ascertain the student’s
achievement and proficiency in a given subject.
10. Teacher-made tests are least used for research purposes.
11. They do not have norms whereas providing norms is quite essential for
standardised tests.

Uses of Teacher-Made Tests

1. To help a teacher to know whether the class in normal, average, above
average or below average.
2. To help him in formulating new strategies for teaching and learning.
3. A teacher-made test may be used as a full-fledged achievement test which
covers the entire course of a subject.
4. To measure students’ academic achievement in a given course.
5. To assess how far specified instructional objectives have been achieved.
6. To know the efficacy of learning experiences.
7. To diagnose students learning difficulties and to suggest necessary
remedial measures.
8. To certify, classify or grade the students on the basis of resulting scores.
9. Skilfully prepared teacher-made tests can serve the purpose of
standardised test.
10. Teacher-made tests can help a teacher to render guidance and
counselling.
11. Good teacher-made tests can be exchanged among neighbouring schools.
12. These tests can be used as a tool for formative, diagnostic and summative
evaluation.
13. To assess pupils’ growth in different areas.
Standardized Test
A standardized test is a test that is given to students in a very
consistent manner. It means that the questions on the test are all the same,
the time given to each student is also the same, and the way in which the test
is scored is the same for all students. Standardized tests are constructed by
experts along with explicit instructions for administration, standard scoring
procedures, and a table of norms for interpretation.
Thus, a standardized test is administered and scored in a consistent or
"standard" manner. These tests are designed in such a way that the
questions, conditions for administering, scoring procedures, and
interpretations are consistent.
Any test in which the same test is given in the same manner to all test
takers, and graded in the same manner for everyone, is a standardized test.

Standardized tests do not need to be high-stakes tests, time-limited tests, or

multiple-choice tests. The questions can be simple or complex. The subject
matter among school-age students is frequently academic skills, but a
standardized test can be given on nearly any topic, including driving tests,
creativity, personality, professional ethics, or other attributes.
The purpose of standardized tests is to compare the performance of
one individual with another, an individual against a group, or one group with
another group.
Below are lists of common standardized tests. You can explore the
details of these test titles from http://www.study.com.
• Standardized K-12 Exams
• ISEE: Independent School Entrance Examination
• SSAT: Secondary School Admission Test
• HSPT: High School Placement Test
• SHSAT: Specialized High School Admissions Test
• COOP: Cooperative Admissions Examination Program
• PSAT: Preliminary Scholastic Aptitude Test
• GED: General Educational Development Test
• HiSET: High School Equivalency Test
• ACT: American College Test
• SAT: Scholastic Aptitude Test
Locally, the Department of Education has the National Achievement

Test (NAT) for Grades 3, 6, 10 and 12 (see Table 1). Moreover, the Center for
Educational Measurement (CEM), a private firm, also has a list of
standardized tests for incoming Grade 7 and Grade 11 students, and several
others for students entering college such as the Readiness Test for Colleges
and Universities, Nursing Aptitude Test, and Philippine Aptitude Test for
Teachers.

Table 1. NAT Examination Information

Grade/Year Examinee Description
Grade 3 (Elementary) All students in Serves as an entrance
both public and assessment for the
private schools. elementary level.
Grade 6 (Elementary) One of the entrance
examinations to proceed in
Junior High School.
Grade 10 (Junior High School) One of the entrance
examinations to proceed in
Senior High School.
Grade 12 (Senior High Graduating Taken for purposes of
School Completers, called students in both systems evaluation; not a
Basic Education Exit public and private prerequisite for graduation or
Assessment (BEEA)) schools. college enrolment.
Note: The test is a system-based assessment designed to gauge learning outcomes across target
levels in identified periods of basic education. Empirical information on the achievement level of
pupils/students serve as a guide for policy makers, administrators, curriculum planners, principles,
and teachers, along with analysis on the performance of regions, divisions, schools, and other
variables overseen by DepEd.
Achievement and Aptitude Test

How do we determine what a person knows about a certain subject?
Or how do we determine an individual's level of skill in a certain area? One of
the most common ways to do this is to use an achievement test.
What is an Achievement Test?

An achievement test is designed to measure a person's level of skill,
accomplishment, or knowledge in a specific area. The achievement tests that
most people are familiar with are the standard exams taken by every student
in school. Students are regularly expected to demonstrate their learning and
proficiency in a variety of subjects. In most cases, certain scores on these
achievement tests are needed in order to pass a class or continue on to the
next grade level (Cherry, 2020).
Some examples of achievement tests include:
• A math exam covering the latest chapter in your book
• A test in your Psychology class
• A comprehensive final in your Purposive Communication class

• The ACT and SAT exams
• A skills demonstration in your PE class
Each of these tests is designed to assess how much you know at a
specific point in time about a certain topic. Achievement tests are not used to
determine what you are capable of; they are designed to evaluate what you
know and your level of skill at the given moment.
Achievement tests are often used in educational and training settings.
In schools, achievements tests are frequently used to determine the level of
education for which students might be prepared. Students might take such a
test to determine if they are ready to enter into a particular grade level or if
they are ready to pass of a particular subject or grade level and move on to
the next.
Standardized achievement tests are also used extensively in
educational settings to determine if students have met specific learning goals.
Each grade level has certain educational expectations, and testing is used to
determine if schools, teachers, and students are meeting those standards.
Aptitude Test
Unlike achievement tests, which are concerned with looking a person's
level of skill or knowledge at any given time, aptitude tests are instead
focused on determining how capable of a person might be of performing a
certain task.
An aptitude test is designed to assess what a person is capable of
doing or to predict what a person is able to learn or do given the right
education and instruction. It represents a person's level of competency to
perform a certain type of task. Such aptitude tests are often used to assess
academic potential or career suitability and may be used to assess either
mental or physical talent in a variety of domains.
Some examples of aptitude tests include:
• A test assessing an individual's aptitude to become a fighter pilot
• A career test evaluating a person's capability to work as an air traffic
controller

• An aptitude test is given to high school students to determine which

type of careers they might be good at
• A computer programming test to determine how a job candidate might
solve different hypothetical problems
• A test designed to test a person's physical abilities needed for a
particular job such as a police officer or firefighter
Students often encounter a variety of aptitude tests throughout school
as they think about what they might like to study in college or do for as a
career someday. High school students often take a variety of aptitude tests
designed to help them determine what they should study in college or pursue
as a career. These tests can sometimes give a general idea of what might
interest students as a future career.
For example, a student might take an aptitude test suggesting that they
are good with numbers and data. The results might imply that a career as an
accountant, banker, or stockbroker would be a good choice for that particular
student. Another student might find that they have strong language and verbal
skills, which might suggest that a career as an English teacher, writer, or
journalist might be a good choice.
Thus, an aptitude test measures one’s ability to reason and learn new
skills. Aptitude tests are used worldwide to screen applicants for jobs or
educational programs. Depending on your industry and role, you may have to
take one or more of the following kinds of test, each focused on specific skills:
• Numerical Reasoning Test
• Verbal Reasoning Test
• Abstract Reasoning Test
• Mechanical Aptitude Test
• Inductive Reasoning Test
Speed Test versus Power Test

Speed tests consist of easy items that need to be completed within a
time limit. Most group tests of mental ability and achievement are
administered with time limits. In some cases, the time limits are of no
importance, as nearly every subject completes all they can do correctly. In

other tests, the limits are short enough to make rate of work an important
factor in the score and these are called speed tests.
In the context of educational measurement, a power test usually refers
to a measurement tool composed of several items and applied without a
relevant time limit. The respondents have a very long time, or even unlimited
time, to solve each of the items, so they can usually attempt all of them. The
total score is often computed as the number of items correctly answered, and
individual differences in the scores are attributed to differences in the ability
under assessment, not to differences in basic cognitive abilities such as
processing speed or reaction time.
An example of a speed test is a typing test in which examinees are
required to type correctly as many words as possible given a limited amount
of time. An example of a power test was the one developed by the National
Council of Teachers in Mathematics that determine the ability of the
examinees to utilize data to reason and become creative, formulate, solve,
and reflect critically on the problems provided.
Summary
 In this lesson, we did identify and distinguish from each other the different
classifications of assessment. We learned when to use educational and
psychological assessment, or paper-and-pencil and performance-based
assessment. Also, we were able to differentiate teacher-made and
standardized test, achievement and aptitude test, as well as, speed and
power tests.
Assessment
1. Which classification of assessment is commonly used in the classroom
setting? Why?
2. To demonstrate understanding, try giving more examples for each type of assessment.
Type Examples
Educational
Psychological
Paper and pencil
Performance-based
Teacher-made

Standardized
Achievement
Aptitude
Speed
Power
Norm-referenced
Criterion-referenced
3. Match the learning target with the appropriate assessment methods.
Check if the type of assessment is appropriate. Be ready to justify.
Learning targets Selected- Essay Performance Teacher Self-
response Task observation assessment
Example: Exhibit √ √ √
proper dribbling
of a basket ball
1. Identify parts
of a
microscope
and its
functions
2. Compare the
methods of
assessment
3. Arrange the
eating utensils
on table
4. Perform the
dance steps in
“Pandanggo
sa Ilaw”
5. Define
assessment
6. Compare and
contrast
testing and
grading
7. List down all
the Presidents
of the
Philippines
8. Find the
speed of a car
9. Recite the
mission of
SKSU
10. Prepare a
lesson plan in
Mathematics

4. Give the features and use of the following assessments.

Classifications of Assessment Description Use or purpose
1. Speed vs. Power tests
2. Achievement vs. Aptitude
Test
3. Educational vs.
Psychological tests
4. Selected and constructed-
response test
5. Paper-pencil vs.
performance-based test
5. Evaluate the extent of your knowledge and understanding about

assessment of learning and its principles.
Indicators Great Moderate Not
extent extent at all
1. I can discuss the performance-based
assessment.
2. I can explain the meaning of selected-
response test.
3. I can compare power and speed tests.
4. I can compare achievement and aptitude
tests.
5. I can discuss the constructed-response test.
6. I can list down the different classifications of
assessment.
7. I can differentiate between teacher-made
and standardized test.
8. I can explain portfolio as one of the
performance-based assessments.
9. I can give examples of aptitude tests.
10. I can decide what response format (multiple
choice, short answer, essay) is more
applicable.

Enrichment
 Check the varied products of Center for Educational Measurement (CEM)
as regards standardized tests. Access it through this link:
https://www.cem-inc.org.ph/products
 Try taking a free Personality Test available online. You can also try an IQ
test. Share the results with the class.
References
Aptitude Tests. Retrieved from https://www.aptitude-test.com/aptitude-
tests.html
Cherry, Kendra (2020, February 06). How Achievement Tests Measure What
People Have Learned. Retrieved from
https://www.verywellmind.com/what-is-an-achievement-test-2794805
Classroom Assessment. Retrieved from
https://fcit.usf.edu/assessment/selected/responseb.html
Improving your Test Questions. https://citl.illinois.edu/citl-101/measurement-
evaluation/exam-scoring/improving-your-test-questions?src=cte-
migration-map&url=%2Ftesting%2Fexam%2Ftest_ques.html
University of Lethbridge (2020). Creating Assessments. Retrieved from
https://www.uleth.ca/teachingcentre/exams-and-assignments

CHAPTER 3
DEVELOPMENT AND ENHANCEMENT OF TEST
Overview
This chapter deals on the process and mechanics in developing a
written test that is understandably a teacher-made type. As future professional
teachers, one has to be competent in the selection of the learning objectives
or outcomes, preparation of a table of specifications (TOS), the guidelines in
writing varied written test formats, and writing the test itself. Adequate
knowledge of the TOS construction is indispensable in formulating a valid test
in terms of content and construct. Also, the complete understanding of the
rules and guidelines in writing a specific test format would probably ensure an
acceptable and unambiguous test which is fair to the learners. In addition,
reliability and validity are 2 important characteristics of test that shall likewise
be included to guarantee quality. For test item enhancement, topics such as
difficulty index, index of discrimination and even distracter analysis are to be
introduced.
Objective
Upon completion of the unit, the students can demonstrate their
knowledge, understanding and skills in planning, developing and enhancing a
written test.
Lesson 1: Planning a Written Test
Pre-discussion
The setting of learning objectives for an assessment of a course or
subject are and the construction of a table of specifications for a classroom
test require specific skills and experience. To successfully perform these
foregoing tasks, a pre-service teacher should be able to distinguish the
different levels of cognitive behavior and identify the appropriate assessment

method for them. It is assumed that in this lesson, the competencies for
instruction that are cognitive in nature are the ones identified as the targets in
developing a written test, which should be reflected in the test’s table of
specifications to be created.
What to Expect?
1. define the necessary instructional outcomes to be included in a written
test;
2. describe what is a table of specifications (TOS) and its formats;
3. prepare a TOS for a written test; and
4. demonstrate the systematic steps in making a TOS.
Planning a Written Test

To be prepared to learn, write or enhance skills in planning for a good
classroom test, pre-service teachers need to review their prior knowledge on
lesson plan development, constructive alignment, and different test formats.
Hence, aside from this chapter, it is strongly suggested that you read books
and other references in print or online that could help you design a good
written test.
Defining the Test Objectives or Learning Outcomes for Assessment

In designing a well-planned written test, first and foremost, you should
be able to identify the intended learning outcomes in a course, where a written
test is an appropriate method to use. These learning outcomes are
knowledge, skills, attitudes, and values that every student should develop
throughout the course or subject. Clear articulation of learning outcomes is a
primary consideration in lesson planning because it serves as the basis for
evaluating the effectiveness of the teaching and learning process determined
through testing or assessment. Learning objectives or outcomes are
measurable statements that articulate, at the beginning of the course, what
students should know and be able to do or value as a result of taking the
course. These learning goals provided the rationale for the curriculum and

instruction. They provide teachers the focus and direction on how the course
is to be handled, particularly in terms of course content, instruction, and
assessment. On the other hand, they provide the students with the reasons
and motivation to study and endure. They provide students the opportunities
to be aware of what they need to do to be successful in the course, take
control and ownership of their progress, and focus on what they should be
learning. Setting objectives for assessment is the process of establishing
direction to guide both the teacher in teaching and the student in learning.
What are the objectives for testing?

In developing a written test, the cognitive behaviors of learning
outcomes are usually targeted. For the cognitive domain, it is important to
identify the levels of behavior expected from the students. Typically, Bloom’s
Taxonomy was used to classify learning objectives based on levels of
complexity and specificity of the cognitive behaviors. With knowledge at the
base (i.e., lower-order thinking skill), the categories move to comprehension,
application, analysis, synthesis, and evaluation. However, Anderson and
Krathwohl (2001), Bloom’s student and research partner, respectively, came
up with a revised taxonomy, in which the nouns used to represent the levels
of cognitive behavior were replaced by verbs, and the synthesis and
evaluation were switched. Figure 1 presents the two taxonomies.
Figure 1. Taxonomies of Instructional Objectives

In developing the cognitive domain of instructional objectives, key

verbs can be used. Benjamin Bloom created a taxonomy of measurable verbs
to help us describe and classify observable knowledge, skills, attitudes,
behaviors and abilities. The theory is based upon the idea that there are
levels of observable actions that indicate something is happening in the brain
(cognitive activity.) By creating learning objectives using measurable verbs,
you indicate explicitly what the student must do in order to demonstrate
learning. Please refer to Figure 2 and Table 1.
Figure 2. Bloom’s Taxonomy of Measurable Verbs
For better understanding, Bloom has the following description for each
cognitive domain level:
 Knowledge - Remember previously learned information
 Comprehension - Demonstrate an understanding of the facts
 Application - Apply knowledge to actual situations
 Analysis - Break down objects or ideas into simpler parts and find
evidence to support generalizations
 Synthesis - Compile component ideas into a new whole or propose
alternative solutions

 Evaluation - Make and defend judgments based on internal evidence

or external criteria
Table 1. Bloom’s verb charts
Revised Key Verbs (keywords)

Bloom’s Level
Create design, formulate, build, invent, create, compose,
generate, derive, modify, develop.
Evaluate choose, support, relate, determine, defend, judge,
grade, compare, contrast, argue, justify, support,
convince, select, evaluate.
Analyze classify, break down, categorize, analyze, diagram,
illustrate, criticize, simplify, associate.
Apply calculate, predict, apply, solve, illustrate, use,
demonstrate, determine, model, perform, present.
Understand describe, explain, paraphrase, restate, give original
examples of, summarize, contrast, interpret, discuss.
Remember list, recite, outline, define, name, match, quote, recall,
identify, label, recognize.
Bloom’s Definitions
 Remembering - Exhibit memory of previously learned material by recalling
facts, terms, basic concepts, and answers.
 Understanding - Demonstrate understanding of facts and ideas by
organizing, comparing, translating, interpreting, giving descriptions, and
stating main ideas.
 Applying - Solve problems to new situations by applying acquired
knowledge, facts, techniques and rules in a different way.
 Analyzing - Examine and break information into parts by identifying
motives or causes. Make inferences and find evidence to support
generalizations.
 Evaluating - Present and defend opinions by making judgments about
information, validity of ideas, or quality of work based on a set of criteria.
 Creating - Compile information together in a different way by combining
elements in a new pattern or proposing alternative solutions

Table of Specifications
A table of specifications (TOS), sometimes called a test blueprint, is
a tool used by teachers to design a written test. It is a table that maps out the
test objectives, contents, or topics covered by the test; the levels of cognitive
behavior to be measured; the distribution of items, number, placement, and
weights of test items; and the test format. It helps ensure that the course’s
intended learning outcomes, assessments, and instruction are aligned.
Generally, the TOS is prepared before a test is created. However, it is
deal to prepare one even before the start of instruction. Teachers need to
create a TOS for every test that they intend to develop. The test TOS is
important because it does the following:
 Ensures that the instructional objectives and what the test captures
match
 Ensures that the test developer will not overlook details that are
considered essential to a good test
 Makes developing a test easier and more efficient
 Ensures that the test will sample all important content areas and
processes
 Is useful in planning and organizing
 Offers an opportunity for teachers and students to clarify achievement
expectations.
General Steps in Developing a Table of Specifications

Learner assessment within the framework of classroom instruction
requires good planning.
These are the steps in developing a table of specifications:
1. Determine the objectives of the test. The first step is to identify the test
objectives. This should be based on the instructional objectives. In
general, the instructional objectives or the intended learning outcomes are
identified at the start, when the teacher creates the course syllabus.
Normally, there are three types of objectives: (1) cognitive, (2) affective,
and (3) psychomotor. Cognitive objectives are designed to increase an
individual’s knowledge, understanding, and awareness. On the other hand,
affective objectives aim to change an individual’s attitude into something
desirable, while psychomotor objectives are designed to build physical or
motor skills. When planning for assessment, choose only the objectives

that can be best captured by a written test. There are objectives that are
not meant for a written test. For example, if you test the psychomotor
domain, it is better to do a performance-based assessment. There are also
cognitive objectives that are sometimes better assessed through
performance-based assessment. Those that require the demonstration or
creation of something tangible like projects would also be more
appropriately measured by performance-based assessment. For a written
test, you can consider cognitive, ranging from remembering to creating of
ideas that could be measured using common formats for testing, such as
multiple choice, alternative response test, matching type, and even essays
or open-ended tests.
2. Determine the coverage of the test. The next step in creating the TOS is
to determine the contents of the test. Only topics or contents that have
been discussed in class and are relevant should be included in the test
3. Calculate the weight for each topic. Once the test coverage is
determined, the weight of each topic covered in the test is determined. The
weight assigned per topic in the test is based on the relevance and the
time spent to cover each topic during instruction. The percentage of theme
for a topic in a test is determined by dividing the time spent for that topic
covered in the test. For example, for a test on the Theories of Personality
for General Psychology 101 class, the teacher spent ¼ to 1 ½ hours class
sessions. As such, the weight for each topic is as follows:
Topics No. of Sessions Time Percent of Time

Spent (Weight)
Theories and 0.5 class sessions 30 min 10.0
Concepts
Psychoanalytic 1.5 class sessions 90 min 30.0
Theories
Trait Theories 1 class sessions 60 min 20.0
Humanistic Theories 0.5 class session 30 min 10.0
Cognitive Theories 0.5 class session 30 min 10.0
Behavior Theories 0.5 class session 30 min 10.0
Social Learning 0.5 class session 30 min 10.0
Theories
Total 5 class sessions 300 min 100
(5 hours)

4. Determine the number of items for the whole test. To determine the
number of items to be included in the test, the amount of time needed to
answer the items are considered. As a general rule, students are given 30-
60 seconds for each item in test formats with choices. For one-hour class,
this means that the test should not exceed 60 items. However, because
you need also to give time for test paper/booklet distribution and giving
instructions, the number of items should be less, maybe just 50 items.
5. Determine the number of items per topic. To determine the number of
items to be included in the test, the weights per topic are considered. Thus, using
the examples above, for a 60-item final test, Theories & Concepts, Humanistic
Theories, Cognitive Theories, Behavioral Theories, and social Learning Theories
will have 5 items, Trait Theories – 10 items, and Psychoanalytic Theories – 15
items.
Topic Percent of Time No. of Items
(Weight)
Theory & Concepts 10.0 5
Psychoanalytic 30.0 15
Theories
Trait Theories 20.0 10
Humanistic Theories 10.0 5
Cognitive Theories 10.0 5
Behavioral Theories 10.0 5
Social Learning 10.0 5
Theories
Total 100 50 items
Different Formats of a Table of Specifications

TOS of a test may be drafted in one-way, two-way, or three-way.
1. One-Way TOS. A one-way TOS maps out the content or topic, test
objectives, number of hours spent, and format, number, and placement of
items. This type of TOS is easy to develop and use because it just works
around the objectives without considering the different levels of cognitive
behaviors. However, a one-way TOS cannot ensure that all levels of
cognitive behaviors that should have been developed by the course are
covered in the test.
Topics Test Objectives No. of Format and No. and

Hours Placement of Percent of

Spent Items Items
Theories and Recognize 0.5 Multiple 5 (10.0%)
Concepts important Choice Item
concepts in #s 1-5
personality
theories
Psychoanalytic Identify the 1.5 Multiple 1 (30.0%)
Theories different theories Choice Item
of personality #s 6-20
under the
Psychoanalytic
Model
Others xxx xxx xxx xxx
Total 5 50 (100%)
2. Two-Way TOS. A two-way TOS reflects not only the content, time spent,
and number of items but also the levels of cognitive behavior targeted per
test content based on the theory behind cognitive testing. For example, the
common framework for testing at present in the DepEd Classroom
Assessment Policy is the Revised Bloom’s Taxonomy (DepEd, 2015). One
advantage of this format is that it allows one to see the levels of cognitive
skills and dimensions of knowledge that are emphasized by the test. It also
shows the framework of assessment used in the development of the test.
Nonetheless, this format is more complex than the one-way format.
Content Time No. & KD* Level of Cognitive Behavior, Item Format, No.
Spent Percent and Placement of Items
of Items R U AP AN E C
Theories 0.5 5 (10.0%) F I.3
and Hours #1-3
Concepts C I.2
#4-5
Psycho- F I.2
analytic #6-7
Theories C I.2 I.2
#8-9 #10-11
P I.2 1.2
#12-13 #14-15
M 1.3 II.1 II.1
#16-18 #41 #42
Others
Scoring 1 point per 2 points per item 3 points per
item item
Overall 5 50 20 20 10
Total (100.0%)
Another presentation is shown below:

Content Time No. of Level of cognitive Behavior & Knowledge Dimension*,

Spent Items Item Format, No. & Placement of Items
R U AP AN E C
Theories 0.5 5 I.3 I.2
and hours (10.0%) #1-3 #4-5
Concepts (F) (C)
Psycho- 1.5 15 I.2 I.2 I.2 1.2 II.1 II.1
Analytic hours (30.0%) #6-7 #8-9 #10-11 #14-15 #41 #42
Theories (F) (C) (C) (P) (M) (M)
I.2 1.3
#12-13 #16-18
(P) (M)
Others
Scoring 1 point per item 3 points per item 5 points per item
Overall 50 20 20 10
Total (100.0%)
*Legend: KD = Knowledge Dimension (Factual, Conceptual, Procedural, Metacognitive)
I-Multiple Choice; II – Open-Ended
3. Three-Way TOS. This type of TOS reflects the features of one-way and
two-way TOS. One advantage of this format is that it challenges the test
writer to classify objectives based on the theory behind the assessment. It
also shows the variability of thinking skills targeted by the test. However, it
takes a much longer to develop this type of TOS.
Content Learning Time No. of Level of Cognitive Behavior and Knowledge

Objective Spent Items Dimension*, Item Format, No. and Placement
of Items
R U AP AN E C
Theories Recognize 0.5 5 I.3 I.2
and important hours (10.0%) #1-3 #4-5
Concepts concepts in (F) (C)
personality
theories
Psycho- Identify the 1.5 15 I.2 I.2 I.2 1.2 II.1 II.1
Analytic different hours (30.0%) #6-7 #8-9 #10-11 #14-15 #41 #42
Theories theories of (F) (C) (C) (P) (M) (M)
personality I.2 1.3
under #12-13 #16-18
psychoanalyti (P) (M)
c model
Others

Scoring 1 point per 3 points per item 5 points per

item item
Overall 50 20 20 10
Total (100%)
*Legend: KD = Knowledge Dimension (Factual, Conceptual, Procedural, Metacognitive)
I - Multiple Choice; II – Open-Ended
Summary
 Bloom's taxonomy is a set of three hierarchical models used to classify
learning objectives into levels of complexity and specificity. The three lists
cover the learning objectives in cognitive, affective and psychomotor
domains.
 The cognitive domain list has been the primary focus of most traditional
education and is frequently used to structure curriculum learning
objectives, assessments and activities.
 In the original version of the taxonomy, the cognitive domain is broken into
the following six levels of objectives, namely: knowledge, comprehension,
application, analysis, synthesis and evaluation.
 In the 2001 revised edition of Bloom's taxonomy, the levels are slightly
different: Remember, Understand, Apply, Analyze, Evaluate, Create
(replacing Synthesize).
 Knowledge involves recognizing or remembering facts, terms, basic
concepts, or answers without necessarily understanding what they mean.
 Comprehension involves demonstrating an understanding of facts and
ideas by organizing, comparing, translating, interpreting, giving
descriptions, and stating the main ideas.
 Application involves using acquired knowledge—solving problems in new
situations by applying acquired knowledge, facts, techniques and rules.
Learners should be able to use prior knowledge to solve problems, identify
connections and relationships and how they apply in new situations.
 Analysis involves examining and breaking information into component
parts, determining how the parts relate to one another, identifying motives
or causes, making inferences, and finding evidence to support
generalizations.

 Synthesis involves building a structure or pattern from diverse elements; it

also refers to the act of putting parts together to form a whole.
 Evaluation involves presenting and defending opinions by making
judgments about information, the validity of ideas, or quality of work based
on a set of criteria.
 A Table of Specifications or a test blueprint is a table that helps teachers
align objectives, instruction, and assessment. This strategy can be used
for a variety of assessment methods but is most commonly associated
with constructing traditional summative tests.
 Written test has varied formats and have a set of guidelines to follow.
Enrichment
1. Read the research article titled, “Classroom Test Construction: The Power
of a Table of Specifications” from
https://www.researchgate.net/publication/257822687_Classroom_Test_Co
nstruction_The_Power_of_a_Table_of_Specifications.
2. Watch the video titled, “How to use an automated Table of Specifications:
TOS Made Easy 2019.” Accessible from https://www.youtube.com/watch?
v=75W_N4UKP3A
3. Explore the post of Jessica Shabatura (September 27, 2013) on “Using
Bloom’s Taxonomy to Write Effective Learning Objectives.” Use this link
https://tips.uark.edu/using-blooms-taxonomy/.
4. Watch the video titled, “How to write learning objectives using Bloom’s
Taxonomy.” Accessible from https://www.youtube.com/watch?
v=nq0Ou1li_p0
Assessment
1. Answer the following questions:
1. When planning for a test, what should you do first?
2. Are all instructional objectives measured by a paper-pencil test?
3. When constructing a TOS where objectives are set without classifying
them according to their cognitive behavior, what format do you use?

4. If you designed a two-way TOS for your test, what does this format
have?
5. Why a teacher would consider a three-way TOS than the other
formats?
2. To be able check whether you have learned the important information
about planning the test, please provide your answer to the questions given
in the graphical representation.
3. Below are sets of competencies targeted for instruction taken from a

particular subject area in the DepEd K to 12 curriculum. Check (√) the
assessment method appropriate for the given competence.
1. Sample 1 in Mathematics
Check the competencies appropriate for the given test format or method. Be ready
to justify.
Competencies Appropriate Appropriate for Appropriate
for Constructed for Methods
Objectives Type of Test other than a
Test Format Format Written Test
1. Order fractions less than 1
2. Construct plane figures
using ruler and compass
3. Identify cardinal numbers
from 9001 through
900,000
4. Solve 2-3 steps word
problems on decimals
involving the four
operations
5. Transform a division
sentence into multiplication
sentence and vice-versa
2. Sample 2 in Science
Check (√) the competencies appropriate for the given test format or method
Competencies Appropriate Appropriate for Appropriate for

for Constructed Methods other

Objectives Type of Test than a Written
Test Format Format Test
1. Infer that the weather
changes during the
day and from day-to-
day
2. Practice care and
concern for animals
3. Participate in
campaigns and
activities for
improving/managing
one’s environment
4. Compare the ability of
land and water to
absorb and release
heat
5. Describe the four types
of climate in the
Philippines
3. Sample 3 in Language
Check (√) the competencies appropriate for the given test format or method.
Competencies Appropriate Appropriate for Appropriate for
for Constructed Methods other
Objectives Type of Test than a Written
Test Format Format Test
1. Use words that describe
persons, places, animals,
and events
2. Draw conclusions based
on picture-stimuli/
passages
3. Write a different story
ending
4. Write a simple friendly
letter observing the
correct format
5. Compose riddles, slogans
and announcements from
the given stimuli
4. For the table of specifications, you can apply what you have learned by creating a
two-way TOS of the final exams of your class. Take into considerations the
content or topic, time spent for each topic; knowledge dimension; and item
format, number, and placement for each level of cognitive behavior. An example

of a TOS for a long exam for Abnormal Psychology class is shown below. Some
parts are missing. Complete the TOS based on the given information.
Content Time # of KD* Level of Cognitive Behavior, Item Format, No. and
Spent Items Placement of Items
R U AP AN E C
Disorder Usually 3 hours ? F I.10 I.10 I.10
First Diagnosed in #1-10 #? ?
Infancy, Childhood
or Adolescence
Cognitive Disorder 3 ? C I.10 I.10 I.10
? #? #?
Substance Related 1 10% P I.5 I.5
Disorder (10) #? #?
Schizophrenia and 3 ? M I.10 I.10 I.10
other Psychotic #? #? #?
Disorder
Total ? ? ? ? ? ? ?
10 100 45 25 30
Overall Total
10 100% 45% 25% 30%
5. Test Yourself
Choose the letter of the correct answer to every item given.
1. The instructional objective focuses on the development of learners’
knowledge. Can this objective be assessed using the multiple-choice
format?
A. No, this objective requires an essay format.
B. No, this objective is better assessed using matching type test.
C. Yes, as multiple-choice is appropriate is assessing knowledge.
D. Yes, as multiple-choice is the most valid format when assessing
learning.
2. You prepared an objective test format for your quarterly test in
Mathematics. Which of the following could NOT have been your test
objective?
A. Interpret a line graph
B. Construct a line graph
C. Compare the information presented in a line graph
D. Draw conclusions from the data presented in a line graph
3. Teacher Lanie prepared a TOS as her guide in developing a test. Why
is this necessary?
A. To guide the planning of instruction
B. To satisfy the requirements in developing a test

C. To have a test blueprint as accreditation usually requires this plan

D. To ensure that the test is designed to cover what it intends to
measure
4. Mr. Arceo prepared a TOS that shows both the objectives and the
different levels of cognitive behavior. What format could he have used?
A. One-way format
B. Two-way format
C. Three-way format
D. Four-way format
5. The School Principal wants the teachers to develop a TOS that uses
the two-way format than a one-way format. Why do you think this is the
principal’s preferred format?
A. So that the different levels of cognitive behavior to be tested are
known
B. So that the formats of the test are known by just looking at the TOS
C. So that the test writer would know the distribution of test items
D. So that objectives for instruction are also reflected in the TOS
6. Review the table if specifications that you have developed for your
quarterly examination.
6.1. Is the purpose of assessment clear and relevant to measure desired
learning outcome?
6.2. Are the topics or course contents discussed in class well covered by
the test? Is the number of test items per topic and for the whole test
enough? Does the test cover only relevant topics?
6.3. Are all levels of thinking skills appropriately represented across
topics?
6.4. Are the test formats chosen for the specific desires learning outcomes
the most appropriate method to use? Can you employ other types of
test?
6.5. Would you consider your table of specifications good and effective to
guide you in developing your test? Are there components in the TOS
that need major revisions? How can improve the TOS?
7. Evaluate your skills in planning your test in terms of setting objectives and
designing a table of specifications based on the following scale. Circle the

performance level you are at for (1) setting test objectives and (2) creating a table
of specifications.
Level Performance Benchmark Setting Test Creating Table of
Objectives Specifications
Proficient I know them very well. I can 4 4
teach others where and when
to use them appropriately.
Master I can do it by myself, though, I 3 3
sometimes make mistake.
Developing I am getting there, though I 2 2
still need help to be able to
perfect it.
Novice I cannot do it myself. I need 1 1
help to plan for my tests.
Based on your self-assessment above, choose the following tasks to help you enhance
your skills and competencies in setting course objectives and in designing a table of
specifications.
Level Possible Tasks
Proficient Help or mentor peer or classmates who are having difficulty in setting
test objectives and designing table of specifications.
Master Examine the areas that you need to improve on and address them
immediately. Benchmark with the test objectives and TOS developed
by your peers/classmates who are known to be proficient in this area.
Developing Read more books/references about setting test objectives and

or Novice designing table of specifications.
Ask your teacher to evaluate the test objectives and table of

specifications that you have developed and to give suggestion on how
you can improve them.
“When I plan my test, I first design its TOS, so I know what I should
cover. I usually prepare a Two-way TOS. Actually, because I have
been teaching the same course for many years now, I have come to a
point that all my tests have their two-way TOS ready to be shown to
anybody, most specially my students. Hence, even at the start of term,
Educator’s Feedback
I know what I should teach and how they would be assessed. I know
In an interview
those topics thatwith a high schoolassessed
are appropriately teacher,through
this is what he shared
a written test. on his
practiceWeeks
when before the test
preparing is given, I usually give the TOS to my students,
a test.
so they have a guide in preparing for the test. I allot time in my class
for my students to examine the TOS of the test for them to check if
there were topics not actually taught in the class. My students usually
are surprised when I do this as they don’t normally see TOS of their
teacher’s test. But I do this as I want them to be successful. I find it fair
for them to know how much weight is given to every topic covered in
the test. Most often, the outcome of the test is good as almost all, if not
all, of my students would pass my test.”
This interview merely indicates that preparing a TOS and making it

accessible to students as their guide in preparing for their test is actually very
helpful for them to successfully pass the test. Thus, preparing a TOS should
become a standard practice of all teachers when assessing students’ learning
through a test.
References
Armstrong, P. (2020). Bloom’s Taxonomy. TN: Vanderbilt University Center
for Teaching. Retrieved from https://cft.vanderbilt.edu/guides-sub-
pages/blooms-taxonomy/
Fives, H. & DiDonato-Barnes, N. (February 2013). Classroom Test
Construction: The Power of a Table of Specifications. Practical
Assessment, Research & Evaluation, Volume 18 (3).
Isaacs, Geoff (1996). Bloom’s Taxonomy of Educational Objectives. The
University of Queensland: TEDI. Retrieved from
https://kaneb.nd.edu/assets/137952/bloom.pdf
Macayan, J. (2017). Implementing Outcome-Based Education (OBE)
Framework: Implications for Assessment of Students’ Performance.
Educational Measurement and Evaluation Review, Vol. 8 (1).
Magno, C. (2011). A Closer Look at other Taxonomies of Learning: A Guide
for Assessing Student Learning. The Assessment Handbook, Vol. 5.

Lesson 2: Construction of Written Tests
Pre-discussion
The construction of good tests requires specific skills and experience.
To be able to successfully demonstrate your knowledge and skills in
constructing traditional types of tests that are most applicable to a particular
learning outcome, you should be able to distinguish the different test types

and formats, and understand the process and requirements in setting learning
objectives and outcomes and in preparing the table of specifications. For
proper guidance in this lesson, the performance tasks and success indicators
are presented below.
Performance Success Indicators

Tasks
 Classifying Identify the test format that is most appropriate for a
tests particular learning outcomes
 Designing a Create a test table of specifications (TOS) or

test assessment plan aligned with the desired learning
outcomes, and the teaching-learning activities
 Constructing Develop test items following the general guidelines

test items for test construction of different test formats
What to Expect?
1. describe the characteristics of selected-response and constructed-
response tests;
2. classify whether a test is selected-response or constructed-response;
3. identify the test format that is most appropriate to a particular learning
outcome/target;
4. apply the general guidelines in constructing test items;
5. prepare a written test based on the prepared TOS; and
6. evaluate a given teacher-made test based on guidelines.
Constructing various Types of Traditional Test Formats

Classroom assessments are an integral part of learners’ learning. They
do more than just measure learning. They also inform the learners what
needs to be learned and to what extent and how to learn them. They also
provide the parents some feedback about their child’s achievement of the
desired learning outcomes. The schools also get to benefit from classroom
assessments because the learners’ test results can provide them evidence-
based data that are useful for instructional planning and decision-making. As
such, it is important that assessment tasks or tests are meaningful and further
promote deep learning; as well as fulfill the criteria and principles of test
construction.
There are many ways by which learners can demonstrate their
knowledge and skills and show evidence of their proficiencies at the end of a
lesson, unit, or subject. While authentic or performance-based assessments
have been advocated as the better and more appropriate methods in
assessing learning outcomes, particularly as they assess higher-level thinking
skills (HOTS), the traditional written assessment methods, such as multiple-
choice tests, are also considered as appropriate and efficient classroom
assessment tools for some types of learning targets. This is mainly true for
large classes and when test results are needed immediately for some
educational decisions. Traditional tests are also deemed reliable and exhibit
excellent content and construct validity.
To learn or enhance your skills in developing good and effective test
items for a particular test format, you need to possess adequate knowledge
on different test formats; how and when to choose a particular test format that
is the most appropriate measure of the identified learning objectives and
desired learning outcomes of your subject; and how to construct good and
effective items for each format.
General Guidelines in the Selection of Appropriate Test Format

Not every test is universally valid for every type of learning outcome.
For example, if an intended outcome for a Research Method 1 course is “to
design and produce a research study relevant to one’s field of study,” you
cannot measure this outcome through a multiple-choice test or a matching-
type test.
Hence, to guide you on choosing the appropriate test format and
designing fair and appropriate yet challenging tests, you should ask the
following important questions:
1. What are the objectives or desired learning outcomes of the

subject/unit/lesson being assessed?

Deciding on what test format to use generally depends on your

learning objectives or the desired learning outcomes of the
subject/unit/lesson. Desired learning outcomes (DLOs) are statements of
what learners are expected to do or demonstrate as a result of engaging in
the learning process.
It is suggested that you return to Lesson 4 to review on how to set
or write instructional objectives or intended learning outcomes for a
subject.
2. What level of thinking is to be assessed (i.e., remembering,

understanding, applying, analysing, evaluating and creating)? Does the
cognitive level of the test question match your instructional objectives or
DLOs?
The level of thinking to be assessed and also an important factor to
consider when designing your test, as this will guide you in choose the
appropriate test format. For example, if you intend to assess, how much
your learners are able to identify important concepts discussed in class
(i.e., remembering or understanding level), a selected-response format
such as multiple-choice test would be appropriate. However, if you intend
to assess how your students will be able to explain and apply in another
setting a concept or framework learned in class (i.e., applying and/or
analysing level), you may consider giving constructed-response test format
such as essays.
It is important that when constructing classroom assessment tool,
all levels of cognitive behaviour are represented – from remembering (R),
understanding (U), applying (AP), analysing (AN), evaluating (E), and
creating (C) – and taking into consideration the knowledge dimension, i.e.,
factual (F), conceptual (C), procedural (P), and metacognition (M). You
may return to Lesson 2 and Lesson 4 to review the different levels of
Cognitive Behaviour and Knowledge Dimensions.
3. Is the test match or aligned with the course’s DLOs and the course
contents or learning activities?

The assessment tasks should be aligned with the instructional

activities and the DLOs. Thus, it is important that you are clear about what
DLOs are to be addressed by your test and what course activities or tasks
are to be implemented to achieve the DLOs.
For example, if you want learners to articulate and justify their stand
on ethical decision-making and social responsibility practices and business
(i.e., DLO); then an essay test and class debate are appropriate measures
and tasks for this learning outcome. A multiple-choice test may be used
but only if you intend to assess learners’ ability to recognize what is ethical
versus unethical decision-making practice. In the same manner, matching-
type items may be appropriate if you want to know whether your students
can differentiate and match the different approaches or terms to their
definitions.
4. Are the test items realistic to the students?

The test should be meaningful and realistic to the learners. They
should be relevant or related to their everyday experience. The use of
concepts, terms, or situations that have not been discussed in the class or
that they have never encountered, read, or heard about should be
minimized or avoided. This is to prevent learners from making wild
guesses, which will undermine your measurement of what they have really
learned from the class.
What are the major categories and formats of traditional tests?

For the purposes of classroom assessment, traditional tests fall into
two general categories: 1) selected-response type, in which learners select
the correct response from the given options, and 2) constructed-response
type, in which the learners are asked to formulate their own answers. The
cognitive capabilities required to answer selected-response items are different
from those required by constructed-response items, regardless of contents.
Selected-response tests require learners to choose the correct answer

or best alternative from several choices. While they can cover a wide range of
learning materials very efficiently and measure a variety of learning outcomes,

they are limited when assessing learning outcomes that involved more
complex and higher-level thinking skills. Selected-response tests include:
 Multiple Choice Test. It is the most commonly used format in formal
testing and typically consist of a stem (problem), one correct or best
alternative (correct answer), and 3 or more incorrect or inferior alternatives
(distractors).
 True-False or Alternative Response Test. It generally consists of a
statement and deciding if the statement is true (accurate/correct) or false
(inaccurate/incorrect).
 Matching Type Test. It consists of 2 sets of items to be matched with
each other based on a specified attribute.
Constructed-response tests require learners to supply answers to a
given question or problem. These include:
 Short Answer Test. It consists of open-ended questions or incomplete
sentence that requires learners to create an answer for each item, which is
typically a single word or short phrase. This includes the following types:
 Completion. It consists of incomplete statement that requires the
learners to fill in the blanks with the correct word or phrase.
 Identification. It consists of statements that require the learners to
identify or recall the terms/concepts, people, places or events that
are being described.
 Essay Test. It consists of problems/questions that require learners to
compose or construct written responses, usually long ones with several
paragraphs.
 Problem-solving Test. It consists of problems/questions that require
learners to solve problems in quantitative or non-quantitative settings
knowing knowledge and skills in mathematical concepts and procedures,
and/or other higher-order cognitive skills (e.g., reasoning, analysis, critical
thinking and skills).
General Guidelines in Writing Multiple-Choice Test Items

Writing multiple-choice items requires content mastery, writing skills,
and time. Only good and effective items should be included in the test. Poorly-

written test items could be confusing and frustrating to learners and yield test
scores that are not appropriate to evaluation their learning and achievement.
The following are the general guidelines in writing good multiple-choice items.
They are classified in terms of content, stem, and options.
A. Content
1. Write items that reflect only one specific content and cognitive processing
skill.
Faulty: Which of the following is a type of statistical procedure used to test
a hypothesis regarding significant relationship between variables,
particularly in terms of the extent and direction of association?
A. ANCOVA C. Correlation
B. ANOVA D. t-test
Good: Which of the following is an inferential statistical procedure used to

test a hypothesis regarding significant difference between two qualitative
variables?
A. ANCOVA C. Chi-Square
B. ANOVA D. Mann-Whitney Test
2. Do not lift and use statements from the textbook or other learning materials
as test questions.
3. Keep the vocabulary simple and understandable based on level of
learners/examinees.
4. Edit and proofread the items for grammatical and spelling before
administering to the learners.
B. Stem
1. Write the directions in the stem in a clear and understandable manner.
Faulty: Read each question and indicate your answer by shading the circle
corresponding to your answer.
Good: This test consists of two parts. Part A is a reading comprehension
test, and Part B is grammar/language test. Each question is a

multiple-choice test item with five (5) options. You need to answer
each question but will not be penalized the wrong answer or for
guessing. You can go back and review your answer during the time
allotted.
2. Write stems that are consistent in form and structure, that is, present all
items either in question form or in description or declarative form.
Faulty: (1) Who was the Philippine president during Martial Law?
(2) The first president of the Commonwealth of the Philippines was
_______.
Good: (1) Who was the Philippine president during Martial Law?
(2) Who was the first president of the Commonwealth of the
Philippines?
3. Express the stem positively and avoid double negatives, such as NOT and
EXCEPT in a stem. If a negative word is necessary, underline or capitalize
the words for emphasis.
Faulty: Which of the following is not the measure of variability?
Good: Which of the following is NOT a measure of variability?
4. Refrain from making the stem too wordy or containing too much
information unless the problem or question requires the facts presented to
solve the problem.
Faulty: What does DNA stand for, and what is the organic chemical of
complex molecular structure found in all cells and viruses and codes
genetic information for the transmission of inherited traits?
Good: As a chemical compound, what does DNA stand for?
C. Options
1. Provide three (3) to five (5) options per item, with only one being the correct
or best answer/alternative.
2. Write options that are parallel or similar in form and length to avoid giving
clues about the correct answer.
Faulty: What is an ecosystem?

A. It is a community of living organisms in conjunction with the non-living

components of their environmental that interact as a system. These
biotic and abiotic components are linked together through nutrient
cycles and energy flows.
B. It is a place on Earth’s surface where life dwells.
C. It is an area that one or more individual organisms defend against
competition from other organisms.
D. It is the biotic and abiotic surroundings of an organism or population.
E. It is the largest division of the Earth’s surface filled with living organisms.
Good: What is an ecosystem?
A. It is a place on the Earth’s surface where life dwells.
B. It is the biotic and abiotic surroundings of an organism or population.
C. It is the largest division of the Earth’s surface filled with living
organisms.
D. It is a large community of living and non-living organisms in a particular
area.
E. It is an area that one or more individual organisms defend against
competition from other organisms.
3. Place options in a logical order (e.g., alphabetical, from shortest to
longest).
Faulty: Which experimental gas law describes how the pressure of a gas
tends to increase as the volume of the container decreases? (i.e., “The
absolute pressure exerted by a given mass of an ideal gas is inversely
proportional to the volume it occupies.”)
A. Boyle’s Law D. Avogadro’s Law
B. Charles’ Law E. Faraday’s Law
C. Beer Lambert Law
Good: Which experimental gas law that describes how the pressure of gas
tends to increase as the volume of the container decreases? (i.e.,
“The absolute pressure exerted b y a given mass of an ideal gas is
inversely proportional to the volume it occupies.”)
A. Avogadro’s Law D. Charles Law
B. Beer Lambert Law E. Faraday’s Law
C. Boyle’s Law

4. Place correct response randomly to avoid a discernable pattern of correct

answers.
5. Use None-of-the-above carefully and only when there is one absolutely
correct answer, such as in spelling or math items.
Faulty: Which of the following is a nonparametric statistic?
A. ANCOVA D. t-test
B. ANOVA E. None of the Above
C. Correlation
Good: Which of the following is a nonparametric statistic?
A. ANCOVA D. Mann-Whitney U
B. ANOVA E. t-test
C. Correlation
6. Avoid All of the Above as an option, especially if it is intended to be correct
answer.
Faulty: Who among the following has become the President of Philippine
Senate?
A. Ferdinand Marcos D. Quintin Paredes
B. Manuel Quezon E. All of the Above
C. Manuel Roxas
Good: Who was the first ever President of the Philippines Senate?
A. Eulogio Rodriguez D. Manuel Roxas
B. Ferdinand Marcos E. Quintin Paredes
C. Manuel Quezon
7. Make all options realistic and reasonable.
General Guidelines in Writing Matching-type items

The matching test item requires learners to match a word, sentence, or
phrase in one column (i.e., premise) to a corresponding word, sentence, or
phrase in a second column (i.e., response). It is most appropriate when you
need to measure the learners’ ability to identify the relationship or association
between similar items. They work best when the course content has many
parallel concepts. While matching-type test format is generally used for simple

recall of information, you can find ways to make it applicable or useful in

assessing higher level of thinking such as applying and analyzing.
The following are the general guidelines in writing good and effective
matching-type tests:
1. Clearly state in the directions the basis for matching the stimuli with the
responses.
Faulty: Directions: Match the following.
Good: Directions: Column I is a list of countries while Column II
presents the continents where these countries are located. Write the
letter of the continent corresponding to the country on the line provided
in Column I.
Item #1’s instruction is less preferred as it does not detail the basis for
matching the stem and the response options.
2. Ensure that the stimuli are longer and the responses are shorter.
Faulty: Match the description of the flag to its country.
A B
____Bangladesh A. Green background with red circle in the center
____Indonesia B. One red strip on top and white strip at the bottom
____Japan C. Red background with white five-petal flower in the
center
____Singapore D. Red background with large yellow circle in the center
____Thailand E. Red background with large yellow pointed star in the
center
F. White background with large red circle in the center
Good: Match the description of the flag to its country.

A B
___Green background with a red circle in the center A. Bangladesh
___One red strip on top and white strip at the bottom B. Hong Kong
___Red background with five-petal flower in the center C. Indonesia
___Red background with large yellow pointed star in the center D. Japan
___White background with red circle in the center E. Singapore
F. Vietnam
Item #2 is a better version because the descriptions are presented in
the first column while the response options are in the second column.
The stems are also longer than the options.

3. For each item, include only topics that are related with one another and
share the same foundation of information.
Faulty: Match the following:
A B
_____1. Indonesia A. Asia
_____2. Malaysia B. Bangkok
_____3. Philippines C. Jakarta
_____4. Thailand D. Kuala Lumpur
_____5. Year ASEAN was established E. Manila
F. 1967
Good: On the line to the left of each country in Column I, write the letter of the
country’s capital presented in column II.
Column I Column II
_____1. Indonesia A. Bandar Seri Begawan
_____2. Malaysia B. Bangkok
_____3. Philippines C. Jakarta
_____4. Thailand D. Kuala Lumpur
E. Manila
Item #1 is considered an unacceptable item because its response
options are not parallel and include different kinds of information that
can provide clues to the correct/wrong answers. On the other hand,
item #2 details the basis for matching and the response options only
include related concepts.
4. Make the response options short, homogeneous, and arranged in logical
order.
Faulty: Match the chemical elements with their characteristics.
A B
_____ Gold A. Au
_____ Hydrogen B. Magnetic metal used in steel
_____ Iron C. Hg
_____ Potassium D. K
_____ Sodium E. With lowest density
F. Na
Good: Match the chemical elements with their symbols.
A B
_____ Gold A. Au
_____ Hydrogen B. Fe
_____ Iron C. H
_____ Potassium D. Hg
_____ Sodium E. K
F. Na

In item #1, response options are not parallel in content and length.
They are not also arranged alphabetically.
5. Included response options that are reasonable and realistic and similar in
length and grammatical form.
Faulty: Match the subjects with their course description.
A B
___ History A. Studies the production and distribution of
goods/services
___ Political Science B. Study of politics and power
___ Psychology C. Study of society
___ Sociology D. Understand role of mental functions in social
behaviour
E. Uses narratives to examine and analyze past
events
Good: Match the subjects with their course description

A B
___ 1. Study of living things A. Biology
___ 2. Study of mind and behaviour B. History
___ 3. Study of policies and power C. Political Science
___ 4. Study of recorded events in D. Psychology
the past
___ 5. Study of society E. Sociology
F. Zoology
Item #1 is less preferred because the response options are not

consistent in terms of their length and grammatical form.
6. Provide more response options than the number of stimuli.

Faulty: Match the following fractions with their corresponding decimal equivalents.
A B
___ 1/4 A. 0.25
___ 5/4 B. 0.28
___ 7/25 C. 0.90
___ 9/10 D. 1.25
Good: Match the following fractions with their corresponding decimal
equivalents.

A B
___ 1/4 A. 0.09
___ 5/4 B. 0.25
___ 7/25 C. 0.28
___ 9/10 D. 0.90
E. 1.25
Item #1 is considered inferior to item #2 because it includes the same
number of response options as that of the stimuli, thus making it more
prone to guessing.
General Guidelines in Writing True or False items

True or False items are used to measure learners’ ability to identify
whether a statement or proposition is correct/true or incorrect/false. They are
best used when learners’ ability to judge or evaluate is one of the desired
learning outcomes of the course.
There are different variants of the true or false items. These include the
following:
1. T-F Correction or Modified True or False Question. In this format,
the statement is presented with a key word or phrase that is
underlined, and the learner has to supply the correct word or phrase.
e.g., Multiple-choice test is authentic.
2. Yes-No Variation. In this format, the learner has to choose yes or no,
rather than true or false.
e.g., The following are kinds of test. Circle Yes if it is authentic test
and No if not.
Multiple Choice Test Yes No
Debates Yes No
End-of-the Term Project Yes No
True or False Test Yes No
3. A-B Variation. In this format, the learners has to choose A or B, rather
than true or false.
e.g., Indicate which of the following are traditional or authentic tests
by circling A if it is a traditional test and B if it is authentic.
Traditional Authentic

Multiple Choice Test A B

Debates A B
End-of-the Term Project A B
True or False Test A B
Because true or false test items are prone to guessing, as learners

are asked to choose between two options, utmost care should be
exercised in writing true or false items.
The following are the general guidelines in writing true or false
items:
1. Include statements that are completely true or completely false
Faulty: The presidential system of government, where the president is
only the head of state or government, is adopted by the United
States, Chile, Panama, and South Korea.
Good: The presidential system, where the president is only the head of
the state or government, is adopted by Chile.
Item#1 is of poor quality because, while the description is right, the

countries given are not all correct. While South Korea has a
presidential system of government, it also has a prime minister who
governs alongside with the president.
2. Use simple and easy-to-understand statements

Faulty: Education is a continues process of higher adjustment for
human beings who have evolved physically and mentally, which
is free and conscious of God, as manifested in nature around
the intellectual, emotional, and humanity of man.
Good: Education is the process of facilitating learning or the acquisition
of knowledge, skills, values, beliefs, and habits.
Items # 1 is somewhat confusing, especially for younger learners
because there are many ideas in one statement.
3. Refrain from using negatives - especially double negatives.

Faulty: There is nothing illegal about buying goods through the internet.

Good: It is legal to buy things or goods through internet.
Double negatives are sometimes confusing and could result in wrong

answers, not because the learner does not know the answer but
because of how the test items are presented.
4. Avoid using absolute such as “always” and “never.”
Faulty: The news and information posted on the CNN website is always
accurate.
Good: The news and information posted on the CNN website is usually
accurate.
Absolute words such as “always” and “never” restrict possibilities and
make a statement as true 100 percent or all the time. They are also
hint for a “false” answer.
5. Express a single idea in each test item.
Faulty: If an object is accelerating, a net force must be acting on it, and
the acceleration of an object is directly proportional to the net
force applied to the object.
Good: If an object is accelerating, a net force must be acting on it.
Item # 1 consists of two conflicting ideas, wherein one is not correct.

6. Avoid the use of unfamiliar words or vocabulary.
Faulty: Esprit de corps among soldiers is important in the face of
hardships and opposition in fighting the terrorists.
Students may have a difficult time understanding the statement,

especially if the word “esprit de corps” has not been discussed in the
class. Using unfamiliar words would likely lead to guessing.
7. Avoid lifting statements from the textbook and other learning materials.
General Guidelines in Writing Short-answer Items

A short-answer test item requires the learner to answer a question or to
finish an incomplete statement by filling in the blank with the correct word or
phrase. While it is most appropriate when you only intend to assess learners’
lower-level thinking, such as their ability to recall facts learned in class, you

can create items that minimize guessing and relevant clues to the correct
answer.
The following are the general guidelines in writing good fill-in-the-blank
or completion test items:
1. Omit only significant words from the statement.
Faulty: Every atom has a central _____ called a nucleus.
Good: Every atom has a central core called a (n) ______.
In item # 1, the word “core” is not the significant word. The item is also
prone to many and varied interpretations, resulting to many possible
answers.
2. Do not omit too many words from the statement such that the intended
meaning is lost.
Faulty: _______ is to Spain as the _______ is to United States and as
_______ is to Germany.
Good: Madrid is to Spain as the ______ is to France.
Item # 1 is prone to many and varied answers. For example, a student
may answer the question based on the capital of these countries or based
on what continent they are located. Item # 2 is preferred because it is
more specific and requires only one correct answer.
3. Avoid obvious clues to the correct response.
Faulty: Ferdinand Marcos declared martial law in 1972. Who was the
president during that period?
Good: The president during the martial law year was ___.
Item #1 already gives a clue that Ferdinand Marcos was the president
during this time because only the president of a country can declare
martial law.
4. Be sure that there is only one correct response.
Faulty: the government should start using renewable energy sources
for generating electricity, such as ____.
Good: the government should start using renewable sources of energy
by using turbines called ___.
Item #1 has many possible answers because the statement is very
general (e.g., wind, solar, biomass, geothermal, and hydroelectric). Item #
2 is more specific and only requires one correct answer (i.e., wind).

5. Avoid grammatical clues to the correct response.

Faulty: A subatomic particle with a negative electric charge is called an
_____.
Good: A subatomic particle with a negative electric charge is called
a(n) ____.
The word “an” in item #1 provides a clue that the correct answer starts
with a vowel.
6. If possible, put the blank at the end of a statement rather than at the
beginning.
Faulty: ___ is the basic building block matter.
Good: The basic building block of matter is ___.
In Item #1, learners may need to read the sentence until the end before
they can recognize the problem, and then re-read it again and then
answer the question. On the other hand, in item #2, learners can already
identify the context of the problem by reading through the sentence only
once and without having to go back and re-read the sentence.
General Guidelines in Writing Essay Tests

An essay test is an item which requires a response composed by the
examinee, usually in the form of one or more sentences, of a nature that no
single response or pattern of responses can be listed as correct, and the
accuracy and quality of which can be judged subjectively only by one skilled
or informed in the subject.
Teachers generally chose and employ essay tests over other forms of
assessment because essay tests require learners to create a response rather
than to simply select a response from among the alternatives. They are the
preferred form of assessment when teachers want to measure learners’
higher-order thinking skills, particularly their ability to reason, analyze,
synthesize and evaluate. They also assess learners’ writing abilities. They are
most appropriate for assessing learners’ (1) understanding of subject-matter
content, (2) ability to reason with their knowledge of the subject, and (3)
problem-solving and decision skills because items or situations presented in
the test are authentic or close to real life experiences.

There are two types of essay test: (1) extended-response essay and
(2) restricted-response essay.
These are the general guidelines in constructing good essay questions:

1. Clearly define the intended learning outcomes to be assessed by the
essay test.
 To design effective essay questions or prompts, the specific intended
learning outcomes are identified. If the intended learning outcomes to
be assessed lack clarity and specificity, the questions or prompts may
assess something other than what they intend to assess. Appropriate
direct verbs that most closely match the ability of the learners should

demonstrate must be used in the prompts. These include verbs such

as compose, analyze, interpret, explain, and justify, among others.
2. Refrain from using essay test for intended learning outcomes that are
better assessed by other kinds of assessment.
 Some intended learning outcomes can be efficiently and reliably
assessed by selected-type test rather than by essay test. In the same
manner, there are intended learning outcomes that are better
assessed using other authentic assessments, such as performance
test, rather than by essay test. Thus, it is important to take into
consideration the limitations of essay tests when planning and
deciding what assessment method to employ for an intended learning
outcome.
3. Clearly define and situate the task within a problem situation as well as
the type of thinking required to answer the test.
 Essay questions or prompts should provide clear and well- defined
tasks to the learners. It is important to carefully choose the directive
verb, to write clearly the object or focus of the directive verb, and to
delimit the scope of the task. Having clear and well-defined tasks will
guided learners on what to focus on when answering the prompts,
thus avoiding responses that contain ideas that are unrelated or
irrelevant, too long, or focusing only on some part of the task.
Emphasizing the types of thinking required to answer the question will
also guide students on the extent to which they should be creative,
deep, complex, and analytical in addressing and responding to the
questions.
4. Present tasks that are fair, reasonable, and realistic to the students.
 Essay questions should contain tasks or questions that students will
be able to do or address. These include those that are within the level
of instruction or training, expertise, and experience of the students.
5. Be specific in the prompts about the time allotment and criteria for
grading the response.
 Essay prompts and directions should indicate the approximate time
given to the students to answer the essay questions to guide them on

how much time they should allocate for each item, especially if several
essay questions are presented. How the responses are to be graded
or rated should also be clarified to guide the students on what to
include in their responses.
General Guidelines in Problem-solving Test items

Problem-solving test items are used to measure learners’ ability to
solve problems that require quantitative knowledge and competencies and/or
critical thinking skills. These items present a problem situation or task that will
require learners to demonstrate work procedures or come up with a correct
solution. Full or partial credit can be assigned to the answer, depending on
the answers or solutions required.
There are different variations of the quantitative problem-solving items.
These included the following:
1. One answer choice - This type of question contains four or five options,
and students are required to choose the best answer.

Example: What is the mean of the following score distribution: 32, 44. 56.
69, 75, 77, 95, 96?
A. 68 D. 74
B. 69 E. 76
C. 72
2. All possible answer choices - This type of question has four or five
options, and students are required to choose all of the options that are
correct.
Example: Consider the following score distribution: 12, 14, 14, 14, 17, 24,
27, 28, and 30. Which of the following is/are the correct measure/s of
central tendency? Indicate all possible answers.
A. Mean = 20 D. Median = 17
B. Mean = 22 E. Mode = 14
C. Median = 16
Options A, D, and E are all correct answers.
3. Type-in answer – This type of question does not provide options to
choose from. Instead, the learners are asked to supply the correct
answer. The teacher should inform the learners at the start how their
answer will be rated. For example, the teacher may require just the
correct answer or may require learners to present the step-by-step
procedures in coming up their answers. On the other hand, for non-
mathematical problem solving, such as a case study, the teacher may
present a rubric how their answer will be rated.
Example: Compute the mean of the following score distribution: 32, 44,
56, 69, 75, 77, 95, and 96. Indicate your answer in the blank
provided.
In this case, the learners will only need to give the correct answer
without having to show the procedures for computation.
Example: Lillian, a 55-year old accountant, has been suffering from
frequent dizziness, nausea, and light-headedness. During the
interview, Lillian was obviously restless, and sweating. She
reported feeling so stressed and fearful of anything without any
apparent reason. She could not sleep and eat well. She also
started to withdraw from family and friends, as she experienced

frequent panic attacks. She also said that she was constantly
worrying about everything in work and at home. What might be
Lillian’s problem? What should she do to alleviate all her
symptoms?
Problem-solving test items are good test format as they minimize
guessing, measure instructional objectives that focus in higher cognitive
levels, and measure extensive amount of contents or topics. However,
they require more time for teachers to construct, read, and correct, and
are prone to rater bias, especially when scoring rubrics/criteria are not
available. It is therefore important that good quality problem-solving test
items are constructed.
The following are some of the general guidelines in constructing
good problem-solving test items:
1. Identify and explain the problem clearly.
Faulty: Tricia was 135.6 lbs. when she started with her zumba
exercises. After three months of attending the sessions three
times a week, her weight was down to 122.8 lbs. About how
many lbs. did she lose after three months? Write your final
answer in the space provided and show your computations.
[This question asks “about how many” and does not indicate whether
learners need to give the exact weight or whether they need to round
off their answer and to what extent.]
Good: Tricia was 135.6 lbs. when she started with her zumba
exercises. After three months of attending the sessions three
times a week, her weight was down to 122.8 lbs. Did she lose
after three months? Write your final answer in the space
provided and show your computations. Write the exact weight;
do not round off.
2. Be specific and clear of the type of response required from the
students.
Faulty: ASEANA Bottlers, Inc. has been producing and selling Tutti
Fruity juice in Philippines, aside from their Singapore market.
The sales for the juice in the Singapore market were $5 million

more than those of their Philippine market in 2016, S$3 million

more in 2017, and S$4.5 million in 2018. If the sales in
Philippine market in 2018 were PHP35million, what were the
sales in Singapore market during that year?
[This is a faulty question because it does not specify in what currency
the answer be presented.]
Good: ASEANA Bottlers, Inc. has been producing and selling Tutti
Fruity juice in Philippines, aside from their Singapore market.
The sales for the juice in the Singapore market were S$5
million more than those of their Philippine market 2016, S$3
million more in 2017, and S$4.5 million in 2018. If the sales in
Mexican market in 2018 were PHP 35 million, what were the
sales in U.S. market during that year? Provide answer in
Singapore dollars (1S$ = PHP36.50). [This is a better item
because it specifies in what currency should the answer be
presented, and the exchange rate was given.]
3. Specify in the directions the bases for grading students’
answer/procedures.
Faulty: VCV Consultancy Firm was commissioned to conduct a survey
on the voters’ preferences in VIsayas and Mindanao for
upcoming presidential election. In Visayas, 65% are for Liberal
Party (LP) candidate, while 35% are for the Nationalists, while
30% are LP supporters. A survey was conducted among 200
voters for each region. What is the probability that the survey
will show a greater percentage of Liberal Party supporters in
Mindanao than in the Visayas region?
[This question is undesirable because it is does not specify the basis

for grading the answer.]
Good: VCV Consultancy Firm was commissioned to conduct a survey

on voter’ preferences in Visayas and Mindanao for the
upcoming presidential election. In Visayas, 65% are for Liberal

Party (LP) candidate, while 35% are for the Nationalist Party
(NP) candidate. In Mindanao, 70% of the voters are Nationalist
while 30% are LP supporters. A survey was conducted among
200 voters for each region.
What is the probability that the survey will show a greater

percentage of Liberal Party supporters in Mindanao than in the
Visayas region? Please show your solutions to support your
answer. Your answer will be graded as follows:
 0 points = for wrong answer and wrong solution
 1 points = for correct answer only (i.e., without or wrong
solution)
 3 points = for correct answer with partial solutions
 5 points = for correct answer with complete solutions
Assessment
A. Let us review what you have learned about constructing traditional tests.
1. What factors should be considered when choosing a particular test
format?
2. What are the major categories and formats of traditional tests?
3. When are the following traditional tests appropriate to use?
 Multiple-choice test - short-answer test
 Matching-type test - essay test
 True or false test - problem-solving tests
4. How should the items for the above traditional tests be constructed?
To check whether you have learned the important information about
constructing the traditional types of tests, please complete the following
graphical representation:

5. Based on the guidelines on writing items for traditional tests and

examples of good and faculty items presented, you are now ready to
construct effective tests of different formats to assess your learners or
the learning outcomes. Let us apply what you have learned by creating
an assessment plan for your chosen subject. List down the desired
learning outcomes and subject topic or lesson; and for each desired
learning outcome, identify the appropriate test format to assess
learners’ achievement of the outcome. It is important that you have an
assessment plan for each subject.
Example of an Assessment Plan

Subject: Economics
Desired Learning Topic/Lesson Types of Test
Outcomes
e.g., Definition of demand Multiple-choice;
Show understanding of and supply, shortage, True or false, matching
the concept of demand surplus, and market type, and completion
and supply equilibrium test
Effects of change of
demand and supply on
market price
Apply the concepts of Exchange Rate, Essay, problem sets,
demand and supply in Change in the Price of case analysis, and
actual cases Goods in the Market, exercises
Price Ceiling and Price
Floor
Others
B. Now that you are able to identify the types of assessment that you will
employ for each desired learning outcome for a subject, you are now
ready to construct sample tests for the subject. Construct a three-part test
that includes test formats of your choice. In the development of the test,
you will need the following information:
1. Desired learning outcomes for subject area.
2. Level of cognitive/thinking skills appropriate to assess the desired
learning outcomes
3. Appropriate test format to use

4. Number of items per learning outcome or area and the weights

5. Number of points for each item and total number of points the whole
test
Note: In the development of the test, you should take into consideration
the guidelines on developing table of specifications and on constructing
the test items.
C. Evaluate the sample tests that you have developed by using the following
checklist for the three test formats that you used.
1. Checklist for Writing Multiple-Choice Test Items

Yes No
1. Does the item reflect specific content and mental task?  
2. Are statements from textbook avoided?  
3. Is the item stated in simple and clear language?  
4. Is the item free grammatical and spelling errors?  
5. Are the directions in the stem clear?  
6. Are double negatives avoided?  
7. Does the item contain irrelevant information, making it too  
wordy?
8. Does the item contain no more than five options?  
9. Is the intended answer correct or clearly or clearly the best  
alternative?
10. Are the options parallel in structure and equal in length to  
avoid clues?
11. Are the options written in logical order?  
12. Are the correct answers for all items in the test placed  
randomly?
13. Is the None of the Above option used cautiously?  
14. Is the All of the Above option as the right answer avoided?  
15. Are the options plausible and homogenous?  
2. Checklist for Writing Matching-Type Test

Yes No
1. Do the directions clearly state the basis for matching the  
stimuli with the responses?
2. Is the item free from grammatical or other clues to the  
correct response?
3. Are the stems longer and the responses shorter?  
4. Do the items share the same foundation of information?  
5. Are the answer choices short, homogeneous, and arranged  
logically?
6. Are the options reasonable and realistic?  
7. Are the options similar in length and grammatical form?  
8. Are there more response options than stems?  
3. Checklist for True or False Test Items

Yes No
1. Is the item completely true or completely false?  
2. Is the item written in simple, easy-to-follow statements?  
3. Are negatives avoided?  
4. Are absolutes such as “always” and “never” used sparingly  
or not at all?
5. Do items express only a single idea?  
6. Is the use of unfamiliar vocabulary avoided?  
7. Is the item or statement not lifted from the text, lecture, or  
other materials?
4. Checklist for completion or Fill-in-the-Blank Test Items

Yes No
1. Are the only significant words from statement omitted?  
2. Are only few items omitted from the statement so that the  
intended meaning is not lost?
3. Are obvious clues to the correct response avoided?  
4. Is there is only one correct response to the items?  
5. Are grammatical clues to the correct response avoided?  
6. Is the blank spaced at the end of a statement rather than at  
the beginning?
5. Checklist for Writing Essay Question

Yes No
1. Is the item/topic can best assessed by an essay test?  
2. Is the essay question aligned with the desired learning  
outcomes?
3. Does the essay question contain a clear and delimited  
task?
4. Is the task presented to students realistic and reasonable?  
5. Is the time allotment enough for each essay question?  
6. Do the students know how many points the essay is worth?  
D. Evaluate the level of your skills in developing different test formats using the
following scale:
Level Performance Multiple- Matching- True- Short- Essay
Benchmarking Choice Type False Answer
Proficient I know this 4 4 4 4 4
every well. I
can teach
others on how
to make one.
Master I can do it by 3 3 3 3 3
myself, though I
sometimes
make mistakes.
Developing I am getting 2 2 2 2 2

there, though I
still need help
to be able to
perfect it.
Novice I cannot do it 1 1 1 1 1
myself. I need
help to make a
good/effective
test
E. Based on your self-assessment, choose the following tasks to help you enhance
your skills and competencies in developing different test formats:
Proficient  Help or mentor peer/classmates who are having
difficulty in developing good items for their course
assessment.
Master  Examine the areas that you need to improve on and
address them immediately.
Developing/  Read more books/references on how to develop
Novice effective items.
 Work and collaborate with your peer/classmates in
developing a particular test format.
 Ask your teacher to evaluate the items that you have
developed and to give suggestions on how you can
improve you skills in constructing items.
F. Test your understanding about constructing test items for different test
formats. Answer the following items.
1. What are these statements that learners are expected to do or
demonstrate as a result of engaging in the learning process?
A. Desired learning outcomes C. Learning intents
B. Learning goals D. Learning objectives
2. Which of the following is NOT a factor to consider when choosing a
particular test format?
A. Desired learning outcomes of the lesson
B. Grade level of students
C. Learning activities
D. Level of thinking to be assessed
3. Ms. Daniel is planning to use a traditional/conventional type of
classroom assessment for her Trigonometry quarterly quiz. Which of
the following test formats she will likely NOT use?

A. Fill-in-the-blank test C. Multiple-choice

B. Matching type D. Oral presentation
4. What is the type of test in which the learners are asked to formulate
their own answers?
A. Alternative response type C. Multiple-choice type
B. Constructed-response type D. Selected-response type
5. What is the type of true or false test item in which the statement is
presented with a key word or brief phrase that is underlined, and the
student ha to apply the correct word or phrase?
A. A-B variation C. T-F substitution variation
B. T-F correction question D. Yes-No variation
6. What is the type of test item in which learners are required to answer a
question by filling in a blank with the correct word or phrase?
A. Essay test
B. Fill-in-the-blank or completion test item
C. Modified true or false test
D. Short answer test
7. What is the most appropriate test format to use if teachers want to
measure the learners’ higher order thinking skills, particularly their
abilities to reason, analyze, synthesize, and evaluate?
A. Essay C. Problem solving skills
B. Matching type D. True or False
8. What is the first step when planning to construct a final examination in
Algebra?
A. Come up with a table of specifications
B. Decide on the length of the test
C. Define the desired learning outcomes
D. Select the type of test to construct
9. What is the type of learning outcome that Dr. Oňas is assessing if he
wants to construct a Multiple-choice test for his Philippine History
class?
A. Knowledge C. Problem solving skills
B. Performance D. Product

10. In constructing a fill-in-the-blank or completion test, what guidelines

should be followed?
Educators’ Feedback
“As a teacher in senior high school, I always make sure that my periodical
 Ms.
exams Cudera teaches
measure Practical learning
the expected Research 1 and 2 in as
competencies a public senior
stipulated in high
the
curriculum
school. Whenguideasked
of theabout
Department of Education.
his experiences I then test
in writing create a table
items of
for his
specifications, wherein I follow the correct item allocation per competency based
subjects, he cited
on the number his practice
of hours of referring
being taught back
in the class andtothe
theappropriate
expected cognitive
learning
domain expected
outcomes of everyinlearning
as specified competency.
the DepEd I make
Curriculum sure and
Guide that using
in assessing
varied
students, I am always guided by the DepEd Order No. 8, s. 2015 also known as
types of assessments to measure his students’ achievement of these
the Policy Guidelines on Classroom Assessment for the K to 12 Basic Education
expected
Program. outcomes. This is what he shared:
For this school year, I was assigned to teach Practical Research 1 and 2
courses. To assess students’ learning or achievement, I first conducted
formative assessment to provide me some background on what students know
about Research. The result of the formative assessment allowed me to revise
my lesson plans and gave me some directions on how to proceed with and
handle the courses.
As part of the course requirements, I gave the students a lot of writing activities,
wherein they were required to write the drafts of each part of research. For each
work submitted, I read, checked, and gave comments and suggestions on how
to improve their drafts. I then allowed them to rewrite and revise their works. The
final research paper is used as basis for summative assessment.
I made use of different types of tests to determine how my students are

performing in my class. I administered selected-response type of test such as
multiple-choice test, matching type, completion tests and true or false to
determine how much they have learned about the different concepts, methods,
and data gathering and analysis procedures used in research. In the
development of the test items, I made sure that I edit them for content, grammar,
and spelling. I also checked if the test items conformed to the table of
specifications.
Furthermore, I also relied heavily on essay tests and other performance tasks.
As I have mentioned. I required students to produce or write the different parts
of a research paper as outputs. They were also required to gather data for their
research. I utilized a rubric that was conceptualized collaboratively with my
students in order to evaluate their outputs. I used 360-degrees evaluation of
their output, wherein aside from my assessment, other members would assess
the work of others and leader would also evaluate the work of its members.
I also conducted item analysis after every periodical exams to identify the least
mastered competencies for a given period, which to improve the performance of
the students.”
References

Brame, C. (2013) Writing good multiple choice test questions. Retrieved on

August 26, 2020 from https://cft.vanderbilt.edu/guides-sub-
pages/writing-good-multiple-choice-test-questions/..
Clay, B. (2001). A Short Guide to Writing Effective Test Questions. Kansas
Curriculum Center, Department of Education: Kansas, USA. Retrieved
on August 25, 2020 from
https://www.k-state.edu/ksde/alp/resources/Handout-Chapter6.pdf
Popham, W. (2011). Classroom Assessment: What teachers need to know.
Boston, MA: Pearson Education, Inc.
Reiner et al. (2020). Preparing Effective Essay Questions: A Self-directed
Workbook for Educators. Utah, USA: New Forums Press. Available in
https://testing.byu.edu/handbooks/WritingEffectiveEssayQuestions.pdf
Truckee Meadows Community College (2015, February 18). Writing Multiple
Choice Test Questions. [Video]. YouTube.
https://youtu.be/3zQLZVqksGg
Lesson 3: Improving a Classroom-based Assessment

Pre-discussion
By now, it is assumed that you have known how to plan a classroom
test by specifying the purpose for constructing it, the instructional outcomes to
be assessed, and preparing a test blueprint to guide the construction process.
The techniques and strategies for selecting and constructing different item
formats to match the intended instructional outcomes make up the second
phase of the test development process which is the content of the preceding
lesson. The process however is not complete without ensuring that the
classroom instrument is valid for the purpose for which it is intended. Ensuring
requires reviewing and improving the items which is the next stage in the
process. This lesson offers the pre-service teachers the practical and
necessary ways for improving teacher-developed assessment tools.
What to Expect?
1. list down the different ways for judgmental item-improvement and
other empirically-based procedures;
2. evaluate which type of test item-improvement is appropriate to use;
3. compute and interpret the results for index of difficulty, index of
discrimination and distracter efficiency; and
4. demonstrate knowledge on the procedures for improving a classroom-
based assessment.
Judgmental Item-Improvement
This approach basically makes use of human judgment in reviewing
the items. The judges are teachers themselves who know exactly what the
test for, the instructional outcomes to be assessed, and the items’ level of
difficulty appropriate to his/her class; the teacher’s peers or colleagues who
are familiar with the curriculum standards for the target grade level, the
subject matter content, and the ability of the learners; and the students
themselves who can perceive difficulties based on their past experiences.
Teachers’ Own Review

It is always advisable for teachers to take a second look at the

assessment tools he/she has devised for a specific purpose. To presume
perfection right away after its construction may lead to failure to detect
shortcomings of the test or assessment tasks. There are five suggestions
given by Popham (2011) for the teachers to follow exercising judgment:
1. Adherence to item-specific guidelines and general item-writing
commandments. The preceding lesson has provides specific
guidelines in writing various forms of objectives and non-objective
constructed-response types and the selected-response type for
measuring higher-level thinking skills. These guidelines should be used
by the teachers to check how the items have been planned and written
particularly and their alignment to their intended instructional outcomes.
2. Contribution to score-based inference. The teacher examines if the
expected scores generated by the test contribute to making valid
inference about the learners. Can the scores reveal the amount of
learning achieved or show what have been mastered? Can the score
infer the students’ capability to move on to the next instructional level?
Or rather the scores obtained do not make any differences at all in
describing or differentiating various abilities.
3. Accuracy of contents. This review should especially be considered
when tests have been developed after a certain period of time.
Changes that may occurred due to new discoveries or developments
can refined the test contents of a summative test. If this happens, the
items or the key to correction may be to be revisited.
4. Absence of content gaps. This review criterion is especially useful in
strengthening the score-based inference capability of the test. If the
current tool misses out on important content now prescribed by a new
curriculum standard, the score will likely not give an accurate
description of what is expected to be assessed. The teacher always
ensures that the assessment tool matches what is currently required to
be learned. This is a way to check on the content validity of the test.
5. Fairness. The discussion on item-writing guidelines always give
warning unintentionally favoring the uninformed student obtain higher
scores. These are due inadvertent grammatical clues, unattractive

distracters, ambiguous problems and messy test instructions.

Sometimes, unfairness can happen because of due advantage
received by a particular group like those seated in the front of the
classroom or those coming from a particular socio-economic level.
Getting rid of faulty and biased items and writing clear instructions
definitely add to the fairness of the test.
Peer review
There are schools that encourage peer or collegial review of
assessment instruments among themselves. Time is provided for this activity
and it has almost always yielded good results for improving tests and
performance-based assessment tasks. During these teacher dyad or triad
sessions, those teaching the same subject area can openly review together
the classroom tests and tasks they have devised against some consensual
criteria. The suggestions given by test experts can actually be used collegially
as basis for a review checklist:
a. Do the items follow the specific and general guidelines in writing
items especially on:
 Being aligned to instructional objectives?
 Making the problem clear and unambiguous?
 Providing plausible options?
 Avoiding unintentional clues?
 Having only one correct answer?
b. Are the items free from inaccurate content?
c. Are the items free from obsolete content?
d. Are the test instructions clearly written for students to follow?
e. Is the level of difficulty of the test appropriate to level of learners?
f. Is the test fair to all kinds of students?
Student Review
Engagement of students in reviewing items has become a laudable
practice for improving classroom test. The judgment is based on the students’
experience in taking the test, their impressions and reactions during the

testing event. The process can be efficiently carried out through the use
review questionnaire. Popham (2011) illustrates a sample questionnaire
shown in the textbox below. It is better to conduct the review activity a day
after taking the test so the students still remember the experience when they
see a blank copy of the test.
Item-Improvement Questionnaire for Students
IfIfany
anyof
ofthe
theitems
itemsseemed
seemedconfusing,
confusing,which
whichones
oneswhere
wherethey?
they?
Did
Didany
anyitems
itemshave
havemore
morethan
thanone
onecorrect
correctanswer?
answer?IfIfso,
so,
which ones?
which ones?
Did
Didany
anyitems
itemshave
haveno
nocorrect
correctanswers?
answers?IfIfso,
so,which
whichones?
ones?
Were
Werethere
therewords
wordsin
inany
anyitem
itemthat
thatconfused
confusedyou?
you?IfIfso,
so,which
which
ones?
ones?
Were
Werethe
thedirections
directionsfor
forthe
thetest,
test,or
orfor
forparticular
particularsub-sections,
sub-sections,
unclear?
unclear?IfIfso,
so,which
whichones?
ones?
Another technique of eliciting student judgment for item improvement is
by going over the test with his/her students before the results are shown.
Students usually enjoy this activity since they can get feedback on the
answers they have written. As they tackle each item, they can be asked to
give their answer, and if there is more than one possible correct answer, the
teacher makes notations for item-alterations. Having more than one correct
answer signals ambiguity either in the stem or in the given options. The
teacher may also take the chance to observe sources of confusion especially
when answers vary. During this session, it is important for the teacher to
maintain an atmosphere that allows students to question and give
suggestions. It also follows that after an item review session, the teacher
should be willing to modify the incorrect keyed answers.
Empirically-based Procedures
Item-improvement using empirically-based methods is aimed at
improving the quality of an item using students’ response to the test. Test
developers refer to this technical process as item analysis as it utilizes data
obtained data separately for each item. An item is considered good when its
quality indices, i.e., difficulty index and discrimination index, meet certain
characteristics. For a norm-referenced test, these two indices are related

since the level of difficulty of an item contributes to its discriminability. An item

is good if it can discriminate between those who perform well in the test and
those who do not. However, an extremely easy item, that which can be
answered correctly by more than 85% of the group, or an extremely difficult
item, that which can only be answered correctly by 15%, is not expected to
perform well as a “discriminator”. The group will appear to be quite
homogenous with items of this kind. They are weak items since they do not
contribute to “score-based inference”.
The difficulty index, however, takes a different meaning when used in
the context of criterion-referenced interpretation or testing for mastery. An
item with a high difficulty index will not be considered as an “easy item” and
therefore a weak item, but rather an item that displays the capability of the
learners to perform the expected outcome. It therefore becomes an evidence
of mastery.
Particularly for objective tests, the responses are binary in form, i.e.,
right or wrong, translated into numerical figures as 1 and 0, for obtaining
nominal data like frequency, percentage and proportion. Useful data then are
in the form:
a. Total number of students answering the item (T)
b. Total number of students answering the item right (R)
Difficulty Index
An item is difficult if majority of students are unable to provide the
correct answer. The item is easy if majority of the students are able to answer
correctly. An item can discriminate if the examinees who score high in the test
can answer more the items correctly than examinees who got low scores.
Below is a data set of five items on the additional and subtraction of
integers. Follow the procedure to determine the difficulty and discrimination of
each item.
1. Get the total score of each student and arrange scores from highest to
lowest.
Item 1 Item 2 Item 3 Item 4 Item 5
Student 1 0 0 1 1 1

Student 2 1 1 1 0 1
Student 3 0 0 0 1 1
Student 4 0 0 0 0 1
Student 5 0 1 1 1 1
Student 6 1 0 1 1 0
Student 7 0 0 1 1 0
Student 8 0 1 1 0 0
Student 9 1 0 1 1 1
Student 10 1 0 1 1 0
2. Obtained the upper and lower 27% of the group. Multiply 0.27 by the total
number of students, you will get a value of 2.7. The rounded whole
number value is 3.0. Get the top three students and the bottom 3 students
based on their scores. The top three students are students 2, 5, and 9.
The bottom three students are students 7, 8, and 4. The rest of the
students are not included in the item analysis.
Item 1 Item 2 Item 3 Item 4 Item 5 Total

score
Student 2 1 1 1 0 1 4
Student 5 0 1 1 1 1 4
Student 9 1 0 1 1 1 4
Student 1 0 0 1 1 1 3
Student 6 1 0 0 1 0 3
Student 10 1 0 0 1 0 3
Student 3 0 0 0 1 1 2
Student 7 0 0 0 1 0 2
Student 8 0 1 1 0 0 2
Student 4 0 0 0 0 1 1
3. Obtain the proportion of correct for each item. This is computed for the
upper 27% group and the lower 27% group. This is done by summating
the correct answer per item and dividing it by the total number of students.
Item 1 Item 2 Item 3 Item 4 Item 5 Total

score
Student 2 1 1 1 0 1 4
Student 5 0 1 1 1 1 4
Student 9 1 0 1 1 1 4

Total 2 2 3 2 3
Proportion of the 0.67 0.67 1.00 0.67 1.00
high group (pH)
Student 7 0 0 1 1 0 2
Student 8 0 1 1 0 0 2
Student 4 0 0 0 0 1 1
Total 0 1 2 1 1
Proportion of the 0.00 0.33 0.67 0.33 0.33
low group (pL)
4. The item difficulty is obtained using the following formula:
Item difficulty =
The difficulty is interpreted using the table

Difficulty Remark
0.76 or higher Easy Item
0.25 to 0.75 Average Item
0.24 or lower Difficult Item
Computations
Item 1 Item 2 Item 3 Item 4 Item 5
Index of 0.33 0.50 0.83 0.50 0.67

difficulty
Item Difficult Average Easy Average Average
difficulty
Discrimination Index
Obviously, the power of an item to discriminate between informed and
uninformed groups or between more knowledgeable and less knowledgeable
learners are shown using the item-discrimination index (D). This is an item
statistics that can reveal useful information for improving an item. Basically,
an item discrimination index shows the relationship between the student’s
performance in an item (i.e., right or wrong) and his total performance in the
test represented by the total score. Item-total correlation is usually part of a
package from item analysis. Getting high item-total correlations indicate that
the items contribute well to the total score so that responding item-total

correlations indicate that the items contribute well to the total score so that
responding correctly to these items gives a better chance of obtaining
relatively high total scores in the whole test or subtest.
For classroom tests, the discrimination index shows if a difference
exists between the performance of those who scored high and those who
scored low in the item. As a general rule, the higher the discrimination index
(D), the more marked the magnitude of the difference is, and thus, the more
discriminating the item is. The nature of the difference however, can take
different directions.
a. Positively discriminating item – proportion of high scoring group is
greater than that of the low scoring group
b. Negatively discriminating item – proportion of high scoring group is
less than that of the low scoring group
c. Not discriminating item – proportion of high scoring group is equal
to that of the low scoring group
Computing the discrimination index therefore requires obtaining the
difference between the proportion of the high-scoring group getting the item
correctly and the proportion of the low-scoring group getting the item correctly
using this simple formula:
D = RU/TU – RL/TL
where D = is item discrimination index
RU = number of upper group getting the item correct
TU = number of upper group
RL = number of lower group getting the item correct
TL = number of lower group
Another calculation can bring about the same result as:
D = (RU – RL)/T
where RU = number of upper group getting the item correct
RL = number of lower group getting the item correct
T = number of either group
As you can see R/T is actually getting the p value of an item. So to get
D is to get the difference between the p-value involving the upper half and the

p-value involving the lower half. So the formula for discrimination index (D)
can also be given as (Popham, 2011):
D = pU – pL
where pU is the p-value for upper group (RU/TU)
pL is the p-value for lower group (RL/TL)
To obtain the proportions of the upper and lower groups responding to

the item correctly, the teacher follows these steps:
a. Score the test papers using a key to correction to obtain the total
scores of the students. Maximum score is the total number of
objective items.
b. Order the test papers from highest to lowest score.
c. Split the test papers into halves: high group and lower group
 For a class of 50 or less students, do a 50-50 split. Take the
upper half as the HIGH score group and the lower half as the
LOW group.
 For a big group of 100 or so, take the upper 25% - 27% and
the lower 25% - 27%.
 Maintain equal numbers of test papers for the Upper and
Lower groups.
d. Obtain the p-value for the Upper Group and p-value for the Lower
Group
pUpper = RU/TH; pLower = RL/TH
e. Get the discrimination index (D) by getting the difference between
the p-values.
For purposes of evaluating the discriminating power of items, Popham (2011)

offers the guidelines proposed by Ebel and Frisbie (1991) shown below. The teachers
can be guided on how to select the satisfactory items and what to do to improve the
rest.
Discrimination Item Evaluation
Index
.40 and above Very good items

.30 - .39 Reasonably good items, but possibly

subject to improvement
.20 - .29 Marginal items, usually needing
improvement
.19 and below Poor items, to be rejected or improved by
revision
Items with negative discrimination indices, although significantly high,

are subject right away to revision if not deletion. With multiple-choice items,
negative D is, a forensic evidence of errors in item writing. It suggests the
possibility of:
 Wrong key – More knowledgeable students selected the distracter
which is the correct answer but is not the keyed option.
 Unclear problem in the stem leading to more than one correct
answer
 Ambiguous distracters leading the more informed students be
divided in choosing the attractive options
 Implausible keyed option which more informed students will not
choose
As you can see, awareness of item-writing guidelines can provide cues
on how to improve items hearing negative or non-significant discrimination
indices.
Distracter Analysis
Another empirical procedure to discover areas for item-improvement
utilizes an analysis of the distribution of responses across the distracters.
Obviously, when the difficulty index and discrimination index of the item seem
to suggest its being candidate for revision, distracter analysis becomes a
useful follow-up.
In distractor analysis, however, we are no longer interested in how test
takers select the correct answer, but how the distracters were able to function
effectively by drawing the test takers away from the correct answer. The
number of times each distractor is selected is noted in order to determine the
effectiveness of the distractor. We would expect that the distractor is selected
by enough candidates for it to be a viable distractor. What exactly is an

acceptable value? This depends to a large extent on the difficulty of the item
itself and what we consider to be an acceptable item difficulty value for test
times. If we are to assume that 0.7 is an appropriate item difficulty value, then
we should expect that the remaining 0.3 be about evenly distributed among
the distractors. Let us take the following test item as an example:
In
Inthe
thestory,
story, he
he was
wasunhappy
unhappybecause…………
because…………
A. it rained all day
A. it rained all day
B.
B. he
hewas
was scolded
scolded
C. he hurt himself
C. he hurt himself
D.
D.the
theweather
weatherwas
washot
hot
Let us assume that 100 students took the test. If we assume that A is
the answer and the item difficulty is 0.7, then 70 students answered correctly.
What about the remaining 30 students and the effectiveness of the three
distractors? If all 30 selected D, the distractors B and C are useless in their
role as distractors. Similarly, if 15 students selected D and another 15
selected B, then C is not an effective distractor and should be replaced. The
ideal situation would be for each of the three distractors to be selected by 10
students. Therefore, for an item which has an item difficulty of 0.7, the ideal
effectiveness of each distractor can be quantified as 10/100 or 0.1. What
would be the ideal value for distractors in a four option multiple choice item
when the item difficulty of the item is 0.4? Hint: You need to identify the
proportion of students who did not select the correct option.
From a different perspective, the item discrimination formula can also
be used in distractor analysis. The concept of upper groups and lower groups
would still remain, but the analysis and expectation would differ slightly from
the regular item discrimination that we have looked at earlier. Instead of
expecting a positive value, we should logically expect a negative value as
more students from the lower group should select distracters. Each distractor
can have its own item discrimination value in order to analyse how the
distracters work and ultimately refine the effectiveness of the test item itself. If
we use the above item as an example, the item discrimination concept can be
used to assess the effectiveness of each distractor. Consider a class of 100

students, then shall form the upper and lower groups of 30 students each.
Assume the following results are observed:
Distractor Number of Number of Discrimination Index

Upper Upper Group
Group who who
selected selected
A. it rained all day* 20 10 (20-10)/30 .33
B. he was scolded 3 3 (3-3)/30 0
C. he hurt himself 4 16 (4-16)/30 -.4
D. the weather 3 1 (3-1)/30 .07
was hot
*Correct answer
The values in the last column of the table can once again be
interpreted according to how we examined item discrimination values, but with
a twist. Alternative A is the key and a positive value is the value that we would
want. However, the value of 0.33 is rather low considering the maximum value
is 1. The value for distractor B is 0 and this tells us that the distractor did not
discriminate between the proficient students in the upper group and the
weaker students in the lower group. Hence, the effectiveness of this distractor
is questionable. Distractor C, on the other hand, seems to have functioned
effectively. More students in the lower group than in the upper group selected
this distractor. As our intention in distractor analysis is to identify distractors
that would seem to be the correct answer to weaker students, then distractor
C seems to have done its job. The same cannot be said of the final distractor.
In fact, the positive value obtained here indicates that more of the proficient
students selected this distractor. We should understand by now that this is not
what we would hope for.
Distractor analysis can be a useful tool in evaluating the effectiveness
of our distractors. It is important for us to be mindful of the distractors that we
use in a multiple choice format test as when distractors are not effective, they
are virtually useless. As a result, there is a greater possibility that students will
be able to select the correct answer by guessing as the options have been
reduced.
Summary

 Judgmental item-improvement is accomplished through teacher’s own

review, peer review, and student review.
 Enhancement of test and test items may be possible using empirically-
based procedures like computing the index of difficulty, discrimination
index or distracter analysis.
 For items with one correct alternative worth a single point, the item
difficulty is simply the percentage of students who answer an item
correctly.
 Item discrimination refers to the ability of an item to differentiate among
students on the basis of how well they know the material being tested.
 One important element in the quality of a multiple choice item is the quality
of the item's distractors. A distractor analysis addresses the performance
of these incorrect response options.
Enrichment
Read the following studies:
1. “Difficulty Index, Discrimination Index and Distractor Efficiency in
Multiple Choice Questions,” available from
https://www.researchgate.net/publication/323705126
2. “Item Discrimination and Distractor Analysis: A Technical Report on
Thirty Multiple Choice Core Mathematics Achievement Test Items,”
available from https://www.researchgate.net/publication/335892361
3. “Index and Distractor Efficiency in a Formative Examination in
Community Medicine,” available from
4. “Impact of distractors in item analysis of multiple choice questions.”
Available from : https://www.researchgate.net/publication/332050250
Assessment
A. Below are descriptions of procedures done to review and improve test
item. On the space provided, write J if a judgmental approach is uded and
E if empirically-based.

1. The Math coordinator of Grade 7 classes examined the periodical tests

by the Math teachers to see if their items are aligned to the target
outcomes for the first quarter.
2. The alternatives of the multiple-choice items of the Social Studies test
were reviewed to discover if they have only one correct answer.
3. To determine if the items are efficiently discriminating between the
more able students from the less able ones, a Biology teacher obtained
a discrimination index (D) of the items.
4. A Technology Education teacher was interested to see if the criterion-
referenced test he has devised shows a difference in the item’s post-
test and pre-test’s p-values.
5. An English teacher conducted a session with his students to find out if
there are other responses acceptable in their literature test. He
encouraged them to rationalize their answers.
B. A final test in Science was administered to a Grade 6 class of 50. The
teacher wants to improve further the items for next year’s use. Calculate a
quality index using the given data and indicate the possible revision
needed by some items.
Item Number of students getting Index Revision needed to be

the correct answer done
1 14
2 18
3 10
4 45
5 8
C. Below are additional data collected for the same items. Calculate another quality
index and indicate what needs to be done with the obtained index as a basis.
Item Upper Lower Index Revision needed to be done
Group Group
1 25 9
2 9 9
3 2 8
4 38 8
5 1 7

D. A distracter analysis is given for a test item given to a class of 60. Obtain the
necessary item statistics using the given data.
Item Difficult Discriminatio Group Alternatives
N=30 y index n index A B C D Omit
1 Upper
Lower
Write your evaluation on the following aspects of the item.

a. Difficulty of the item
b. Discrimination power of the item
c. Plausibility of the options
d. Ambiguity of the options
E. For each item, write the letter of your correct answer on the space
provided for.
1. Below are different ways of utilizing the concept of discrimination as an
index of item quality EXCEPT
a. Getting the proportion of those answering the item correctly over
those answering the items
b. Obtaining the difference between the proportion of high-scoring
group and the proportion of low-scoring group getting the item
correctly
c. Getting how much better the performance of the class by item is
after instruction than before
d. Differentiating the performance in an item of a group that has
received instruction and a group that has not
2. What can enable some students to answer items correctly even without
having enough knowledge for what is intended to be measured?
a. Clear and brief test instructions
b. Comprehensible statement of the item stem
c. Obviously correct and obviously wrong alternatives
d. Simple sentence structure of the problem

3. An instructor is going to prepare and end-of-course summative test.

What major consideration should it observe so it will differ from a unit
test?
a. Inclusion of all intended learning outcomes of the course
b. Appropriate length of the test to cover all subject matter topics
c. Preparation of a key to correction in advance for ease of scoring
d. Adequate sampling of higher-level learning outcomes
4. Among the strategies for improving test questions given below, which
is empirical in approach?
a. Items that students find confusing are collected and are revised
systematically
b. Teachers who are teaching the same subject matter collegially
meet to discuss the alignment of items to their learning outcomes
c. Item responses of high-scoring group are compared with those of
the low-scoring group
d. The teacher examines the stem and alternatives for accuracy of
content
5. Which of the following multiple-choice item data shows a need for

revision?
Item A B C D
1 Upper Group 5* 4 9 2
Lower Group 15 0 5 0
2 Upper Group 2 4 12* 2
Lower Group 4 4 5 7
3 Upper Group 2 14* 2 0
Lower Group 4 4 5 7
4 Upper Group 2 4 2 10*
Lower Group 8 5 0 7
*correct answer
References
Conduct the Item Analysis. Retrieved from
http://www.proftesting.com/test_topics/steps_9.php


ExamSoft (2015, August 4). Putting it All Together: Using Distractor Analysis.
[Video]. YouTube. https://www.youtube.com/watch?v=c8r_6bT_VQo
_______ (2015, July 21). The Definition of Item Difficulty. [Video]. YouTube.
https://www.youtube.com/watch?v=oI_7HkgZKj8
_______ (2015, July 23). Twenty-Seven Percent: The Index of Discrimination.
[Video]. YouTube. https://www.youtube.com/watch?v=Fr1KMb8GNNs
Exploring Reliability in Academic Achievement. Retrieved from
https://chfasoa.uni.edu/reliabilityandvalidity.htm
Mahjabeen et al. (2017). Efficiency in Multiple Choice Questions. Annals of
PIMS. Available in
Popham, W. (2011). Classroom Assessment: What teachers need to know.
Boston, MA: Pearson Education, Inc.
Professional Testing, Inc. (2020). Building High Quality Examination
Programs. Retrieved from
The Graide Network, Inc. (2019). Importance of Validity and Reliability in
Classroom Assessments. Retrieved from
https://www.thegraidenetwork.com/blog-all/2018/8/1/the-two-keys-to-
quality-testing-reliability-and-validity
Lesson 4: Establishing Test Validity and Reliability
Pre-discussion
To be able to successfully perform the expected performance tasks,
students should have prepared a test following the proper procedure with
clear learning targets (objectives), table of specifications, and pre-test data
per item. In the previous lesson, guidelines were provided in constructing test
following different formats. They have also learned that assessment becomes
valid when the test items represent a good set of objectives, and this should
be found in table of specifications. The learning objectives or targets will help
them construct appropriate test items.
What to Expect?
At the end of this lesson, the students can:

1. explain the different tests of validity;

2. identify the most practical test to apply when validating a typical
teacher-made assessment;
3. tell when to use a certain type of reliability test;
4. apply the suitable method of reliability test given a set of assessment
results/test data; and
5. decide whether a test is valid or reliable.
In order to establish the validity and reliability of an assessment tool,

pre-service teachers need to know the different ways of establishing test
validity and reliability. They are expected to read this before they can analyse
their test items.
Test Validity
A test is valid when it measures what it is supposed to measure.
Validity pertains to the connection between the purpose of the test and which
data the teacher chooses to quantify that purpose.
If a quarterly exam is valid, then the contents should directly measure
the objectives of the curriculum. If a scale that measure personality is
composed of five factors, then the scores on the five factors should have
items that are highly correlated. If an entrance exam is valid, it should predict
students’ grades after the first semester.
It is better to understand the definition through looking at examples of
invalidity. Colin Foster, an expert in mathematics education at the University
of Nottingham, gives the example of a reading test meant to measure literacy
that is given in a very small font size. A highly literate student with bad
eyesight may fail the test because they cannot physically read the passages
supplied. Thus, such a test would not be a valid measure of literacy (though it
may be a valid measure of eyesight). Such an example highlights the fact that
validity is wholly dependent on the purpose behind a test. More generally, in a
study plagued by weak validity, “it would be possible for someone to fail the
test situation rather than the intended test subject.”

Different Ways to Establish Test Validity

Validity can be divided into several different categories, some of which relate
very closely to one another. Let us discuss a few of the most relevant types through
this matrix.
Type of Definition Procedure
validity
Content When the items The items are compared with the
Validity represent the domain objectives of the program. The items need
being measured to measures directly the objectives (for
achievement) or definition (for scales). A
reviewer conducts the checking.
Face Validity When the test is The test items and layout are reviewed
presented well, free of and tried and layout on a small group of
errors, and respondents. A manual for administration
administered well can be made as a guide for the test
administrator.
Predictive A measure should A correlation coefficient is obtained where
Validity predict a future the X-variables is used as the predictor
criterion. Example is an and Y-variable as the criterion.
entrance exam
predicting the grades of
the students after the
first semester.
Construct The components or The Pearson r can be used to correlate
Validity factors of the test the items for each factor. However, there
should contain items is a technique called factor analysis to
that are strongly determine which items are highly
correlated. correlated to form a factor.
Concurrent When two or more The scores on the measures should be
Validity measures are present correlated.
for each examinee that
measure the same
characteristic
Convergent When the components Correlation is done for the factors of the
Validity or factors of a test a are best.
hypothesized to have a
positive correlation
Divergent When the components Correlation is done for the factors of the
Validity or factors of a test are test.
hypothesized to have a
negative correlate are
the scores in a test on
intrinsic and extrinsic
motivation.

There are cases for each type of validity provided that illustrates how it
is conducted. After reading the cases references about the different kinds of
validity look for a partner and answer the following questions. Discuss your
answer. You may use other references and browse the internet.
1. Content Validity
A coordinator in science is checking the science test paper for Grade 4.
She asked the Grade 4 science teacher to submit the table of specifications
containing the objectives of the lesson and the corresponding items. The
coordinator checked whether each item is aligned with the objectives.
 How are the objectives used when creating test items?
 How is content validity determined when given the objectives and the
items in a test?
 What should be present in a test table of specifications when
determining content validity?
 Who checks the content validity of items?
2. Face Validity
The assistant principal browsed the test paper made by the math
teacher. She checked if the contents of the items are about mathematics. She
examined if instructions are clear. She browsed through the items if the
grammar is correct and if the vocabulary is within the student’s level of
understanding.
 What can be done in order to ensure that the assessment appears to be
effective?
 What practices are done in conducting face validity?
 Why is face validity the weakest form validity?
3. Predictive Validity
The school admission’s office developed an entrance examination. The
officials wanted to determine if the results of the entrance examination are
accurate in identifying good students. They took the grades of the students
accepted for the first quarter. They correlated the entrance exam results and
the first quarter grades. They found significant and positive correlations
between the entrance examination scores and grades. The entrance

examination results predicted the grades of students after the first quarter.
Thus, there was predictive-prediction validity.
 Why are two measures needed in predictive validity?
 What is the assumed connection between these two measures?
 How can we determine if a measure has predictive validity?
 What statistical analysis is done to determine predictive validity?
 How can the test results of predictive validity be interpreted?
4. Concurrent Validity
A school Guidance Counsellor administered a math achievement test
to Grade 6 students. She also has a copy of the students’ grades in math.
She wanted to verify if the math grades of the students are measuring the
same competencies as the math achievement test. The school counsellor
correlated the math achievement scores and math grades to determine if they
are measuring the same competencies.
 What needs to be available when conducting concurrent validity?
 At least how many tests are needed for conducting concurrent validity?
 What statistical analysis can be used to established concurrent validity?
 How are the results of a correlation coefficient interpreted for concurrent
validity?
5. Construct Validity
A science test was made by a Grade 10 teacher composed of four
domains: matter, living things, force and motion, and earth space. There are
10 items under each domain. The teacher wanted to determine if the 10 items
made under each domain really belonged to that domain. The teacher
consulted an expert in test measurement. They conducted a procedure called
factor analysis. Factor analysis is a statistical procedure done to determine if
the items written will load under the domain they belong.
 What type of test requires construct validity?
 What should the test have in order to verify its constructs?
 What are constructs and factors in a test?
 How can these factors be verified if they are appropriate for the test?
 What results come out in construct validity?

 How are the results in construct validity interpreted?

The construct validity of a measure is reported in journal articles. The
following are guided questions used when searching for the construct validity
of a measure from reports:
 What was the purpose of construct validity?
 What type of test was used?
 What are the dimensions or factors that were studied using construct
validity?
 What procedure was used to establish the construct validity?
 What statistics was used for the construct validity?
 What were the results of the test’s construct validity?
6. Convergent Validity
A Math teacher developed a test to be administered at the end of the
school year, which measures number sense, patterns and algebra,
measurement, geometry, and statistics. It is assumed by the math teacher
that students’ competencies in number sense improve their capacity to learn
patterns and algebra and other concepts. After administering the test, the
scores were separated for each area, and these five domains were inter-
correlated using Pearson r. the positive correlation between number sense
and patterns and algebra indicates that, when number sense scores increase,
the patters and algebra scores also increase. This shows student learning of
number sense scaffold patterns and algebra competencies.
 What should a test have in order to conduct convergent validity?
 What are done with the domains in a test on convergent validity?
 What analysis is used to determine convergent validity?
 How are the results in convergent validity interpreted?
7. Divergent Validity
An English teacher taught metacognitive awareness strategy to
comprehend a paragraph for Grade 11 students. She wanted to determine if
the performance of her students in reading comprehension would reflect well
in the reading comprehension test. She administered the same reading

comprehension test to another class which was not taught the metacognitive
awareness strategy. She compared the results using a t-test of independent
samples and found that the class that was taught metacognitive awareness
strategy performed significantly better that the other group. The test has
divergent validity.
 What conditions are needed to conduct divergent validity?
 What assumption is being proved in divergent validity?
 What statistical analysis can be used to establish divergent validity?
 How are the results of divergent validity interpreted?
Test Reliability
Reliability is not at all concerned with intent, instead asking whether the
test used to collect data produces accurate results. In this context, accuracy is
defined by consistency or as to whether the results could be replicated.
Also, it is the consistency of the responses to measure under three
conditions:
1. when retested on the same person;
2. when retested on the same measure; and
3. similarity of responses across items that measure the same
characteristic.
In the first condition, consistent response is expected when the test is
given to the same participants. In the second condition, reliability is attained if
the responses to the same test are consistent with the same characteristic
equivalent or another test that measures but measures the same
characteristic when administered at a different time. In the third condition,
there is reliability when the person responded in the same way or consistently
across items that measure the same characteristic.
There are different factors that affect the reliability of a measure. The
reliability of a measure can be high or low, depending on the following factor:
1. The number of items in a test – The more items a test has, the
likelihood of reliability is high. The probability of obtaining consistent
scores is high because of the large pool of items.

2. Individual difference of participants – every participant possesses

characteristics that affect their performance in a test, such as fatigue,
concentration, innate ability, perseverance, and motivation. These
individual factors change over time and affect the consistency of the
answers in a test.
3. External environment – The external environment may include room
temperature, noise level, depth of instruction, exposure to materials,
and quality of instruction which could affect changes in the responses
of examinees in a test.
What are the different ways to establish test reliability?

There are different ways in determining the reliability of a test. The
specific kind of reliability will depend on the (1) variable you are measuring,
(2) type of test, and (3) number of versions of the test.
The different methods of reliability test are indicated and how they are
done. Please note in the third column that statistical analysis is needed to
determine the test reliability.
Method in How is this reliability done? What is statistics is

Testing used?
Reliability
1. Test-retest You have a test, and you need to Correlate the test
administer it at one time to a group of scores from the first
examinees. Administer it again at and the next
another time to the “sane group” of administration.
examinees. There is a time interval of Significant and positive
not more than 6 months between the correlation indicates
first and second administration of test that the test has
that measure stable characteristics, temporal stability
such as standardized aptitude tests. overtime.
The post-test can be given with a
minimum time interval of 30 minutes. Correlation refers to a
The response in the test should more or statistical procedure
less be the same across the two points where linear
in time. relationship is expected
for two variables.
Test-retest is applicable for tests that Pearson Product

measure stable variables, such as Moment Correlation or

aptitude and psychomotor measures Person r may be used
(e.g., typing test, tasks in physical because test data are
education). usually in an interval
scale (refer to a
statistics book for
Pearson r).
2. Parallel There are two versions of a test. The Correlate the test
Forms items need to exactly measure the results for the first form
same skill. Each test version is called a and the second form.
“form.” Administer one form at one time Significant and positive
and the other form to another time to the correlation coefficient is
“same” group of participants. The expected. The
responses on the two forms should be significant and positive
more or less the same. correlation indicates
that the responses in
Parallel forms are applicable I there are the two forms are the
two versions of the test. This is usually same or consistent.
done when the test is repeatedly used Pearson r is usually
for different groups, such as entrance used for this analysis.
examinations and licensure
examinations. Different versions of the
test are given to a different group of
examinees.
3. Split-Half Administer a test to a group of Correlate the two sets
examinees. The items need to be split in of scores using
halves, usually using the odd-even Pearson r. after the
technique. In this technique, get the correlation use another
sum of the points in the odd-numbered formula called
items and correlate it with the sum of Spearman-Brown
points of the even-numbered items. Coefficient. The
Each examinee will have two scores correlation coefficient
coming from the same test. The scores obtained using Pearson
on each set should be close or r and Spearman Brown
consistent. should be significant
and positive to mean
Split-half is applicable when the test has that the test has
a large number of items. internal consistency
reliability.
4. Test of This procedure involves determining if A statistical analysis
Internal the scores for each item are consistently called Cronbach’s
Consistency answered by the examinees. After alpha or the Kuder-
Using administering the test to a group of Richardson is used to
Kuder- examinees, it is necessary to determine determine the internal
Richardson and record the scores for each item. consistency of the
and The idea here is to see if the responses items. A Cronbach’s
Cronbach’s per item are consistent with each other. alpha value of 0.60 and
Alpha This technique will work well when the above indicates that the
Method assessment tool has a large number of test items have internal
items, it is also applicable for scales and consistency
inventories (e.g., Likert scale from
“strongly agree” to “strongly disagree”)
5. Inter-rater This procedure is used to determine the A statistical analysis
Reliability consistency of multiple raters when called Kendall’s tau

using rating scales and rubrics to judge coefficient of

performance. The reliability here refers concordance is used to
to the similar or consistent ratings determine if the ratings
provided by more than one rater or provided by multiple
judge when they use an assessment raters agree with each
tool. other. Significant
Kendall’s tau value
Inter-rater is applicable when the indicates that the raters
assessment requires the use of multiple concur or agree with
raters. each other in their
rating.
Notice that a statistical analysis is needed to determine the reliability of

a measure. The very basis of statistical analysis to determine reliability is the
use of linear regression.
1. Liner regression
Linear regression is demonstrated when you have two variables that
are measured, such as two set of scores in a test taken at two different times
by the same participants. When the two scores are plotted in a graph (with X-
and Y-axis), they tend to form a straight line. The straight line formed the two
sets of scores can produce a linear regression. When a straight line is formed,
we can say that there is a correlation between the two sets scores. This can
be seen in the graph shown. This correlation is shown in the graph given. The
graph is called a scatterplot. Each point in the scatterplot is a respondent with
two scores (one for each test).
Figure 1. Scatterplot diagram

2. Computation of Pearson r correlation
The index of the linear regression is called a correlation coefficient.
When the points in a scatterplot tend to fall within the linear line, the

correlation is said to be strong. When the direction of the scatterplot is directly

proportional, the correlation coefficient will have a positive value. If the line is
inverse, the correlation coefficient will have a negative value. The statistical
analysis used to determine the correlation coefficient is called the Pearson r.
How the Pearson r is obtained by the following formula and is illustrated
below.
Formula:
where
∑X – Add all the X scores (Monday XY – Multiply the X and Y scores
scores) ∑X2 - Add all the squared values of X
∑Y – Add all the Y scores (Tuesday ∑Y2 – Add all the squared values of Y
scores) scores)
2
X – Square the value of the X scores ∑XY – Add all the production of X and Y
(Monday
2
Y – Square the value of the Y scores
(Tuesday scores)
Suppose that a teacher gave the spelling of two-syllable words with 20

items for Monday and Tuesday. The teacher wanted to determine the
reliability of two sets score by computing for the Pearson r.
Monday Test Tuesday Test

X Y X2 Y2 XY
10 20 100 400 200
9 15 81 225 135
6 12 36 144 72
10 18 100 324 180
12 19 144 361 228
4 8 16 64 32
5 7 25 49 35
7 10 49 100 70
16 17 256 289 272
8 13 64 169 104
∑X=87 ∑Y=139 ∑X2=871 ∑Y2=2125 ∑XY=1328
Applying the formula, we have:

0.80
The value of a correlation coefficient does not exceed 1.00 or -1.00. A

value of 1.00 and -1.00 indicates perfect correlation. In test of reliability
though, we aim for high positive correlation to mean that there is consistency
in the way the students answered the test taken.
Difference between a Positive and a Negative Correlation

When the value of the correlation coefficient is positive, it means that
the higher the scores in X, the higher the scores in Y. This is called a positive
correlation. In the case of the two spelling scores, a positive correlation is
obtained. Then the value of the correlation coefficient is computed to be
negative, it means that the higher the scores in X, the lower the scores in Y,
and vice versa. This is called a negative correlation. When the same test is
administered to the same group of participants, usually a positive correlation
indicates reliability or consistency of the scores.
Determining the Strength of a Correlation

The strength of the correlation also indicates the strength of the
reliability of the test. This is indicated by the value of the correlation
coefficient. The closer the value to 1.00 or -1.00, the stronger is the
correlation. Below is the guide:
0.80-1.00 every strong relationship
0.6-0.79 Strong relationship
0.40-0.59 Substantial/marked relationship
0.2-0.39 Weak relationship
0.00-0.19 Negligible relationship
Internal Consistency of a Test

Another statistical analysis to determine the internal consistency of test

is the Cronbach’s alpha. Follow the given procedure to determine the internal
consistency. Suppose that five students answered a checklist about their
hygiene with a scale of 1 to 5, where in the following are the corresponding
scores:
5 – Always
4 – Often
3 – Sometimes
2 – Rarely
1 – Never
The checklist has five items. The teacher wanted to determine if the
items have internal consistency.
Student Item Item Item Item Item Total for Score- (Score-Mean)2
1 2 3 4 5 each case (x) Mean
A 5 5 4 4 1 19 2.8 7.84
B 3 4 3 3 2 15 -1.2 1.44
C 2 5 3 3 3 16 -0.2 0.04
D 1 4 2 3 3 13 -3.2 10.24
E 3 3 4 4 4 18 1.8 3.24
Total for 14 21 16 17 13 Xcase=16.2 ∑(Score-
each item Mean)2= 22.8
(∑X)
Mean 2.8 4.2 3.2 3.4 2.6 =
5.7
SD2 2.2 0.7 0.7 0.3 1.3
∑ =5.2
The Cronbach’s alpha formula is given by:
where k refers to the number of scale items
refers to the variance associated with item i
refers to the variance associated with the observed total scores
Hence,

The internal consistency of the responses in the attitude toward

teaching is 0.10, indicating low internal consistency.
The consistency of ratings can also be obtained using a coefficient of
concordance. The Kendall’s W coefficient of concordance is used to test the
agreement among raters.
Next illustration is a performance task demonstrated by five students
rated by three (3) raters. The rubric used a scale of 1 to 4, where in 4 is the
highest and 1 is the lowest.
Five Rater Rater Rater Sum of D D2

demonstrations 1 2 3 Ratings
A 4 4 3 11 2.6 6.76
B 3 2 3 8 -0.4 0.16
C 3 4 41 11 2.6 6.67
D 3 3 2 8 -0.4 0.16
E 1 1 2 4 -4.4 19.36
XRatings=8.4 ∑D2=33.2
/ The scores given by the three raters are first computed by summing up
the total rating for each demonstration. The mean is obtained for the sum of
ratings (XRatings=8.4). The mean is subtracted from each of the Sum of Ratings
(D). Each difference is squared (D 2), then the sum of squares is computed
(∑D2=33.2). The mean and summation of squared different is substituted in
the Kendall’s W formula. In the formula, m is the numbers of raters while k is
the number of students who perform the demonstrations.
Let us consider the formula and the substitution of values:

A Kendall’s W coefficient value of 0.37 indicates the agreement of the

three raters in the five demonstrations. Clearly, there is moderate
concordance among the three raters because the value is far from 1.00.
Summary
 A test is valid when it measures what it is supposed to measure. It can be
categorized as face, content, construct, predictive, concurrent, convergent,
or divergent validity.
 Reliability is the consistency of the responses to measure. It can be
implemented through test-retest, parallel forms, split-half, internal
consistency and inter-rater reliability.
Enrichment
A. Get a journal article about a study that developed a measure or conducted
validity or reliability tests. You may also download from any of the following
open source.
 Google Scholar
 Directory of open access journals
 Multidisciplinary open access journals
 Allied academics journals
Your task is to write a short report focusing on important information on
how the authors conducted and established test validity and reliability.
Provide the following information.
1. Purpose of the study
2. Describe the instrument with its underlying factors
3. Validity technique used in the study and analysis they used
4. Reliability techniques used in the study and analysis used
5. Results of the tests validity and reliability
B. Learn more on Reliability and Validity in Student Assessment by watching
a clip from http://www.youtube.com/watch?v=gzv8Cm1jC4M.

C. Read on Magno’s (2009) work titled, “Demonstrating the Difference

between Classical Test Theory and Item Response Theory Using Derived
Test Data” published in the International Journal of Educational and
Psychological Assessment, Volume 1. Access through
https://files.eric.ed.gov/fulltext/ED506058.pdf
Assessment
A. Indicate the type of reliability applicable for each case. Write the type
of reliability on the space before the number.
Reliability Cases
Type
1. Mr. Perez conducted a survey of his students to determine
their study habits. Each item is answered using a five-point
scale (always, often, sometimes, rarely, never). He wanted
to determine if the responses for each item are consistent.
What reliable technique is recommended?
2. A teacher administered a spelling test to her students. After
a day, another spelling test was given with the same length
and stress of words. What reliability can be used for the
two spelling tests?
3. A PE teacher requested two judges to rate the dance
performance of her students in physical education. What
reliability can be used to determine the reliability of the
judgements?
4. An English teacher administered a test to determine
students’ use of verb given a subject with 20 items. The
scores were divided into items 1 to 10, and another for
items 11 to 20. The teacher correlated the two set of
scores that form the same test. What reliability is done
here?
5. A computer teacher gave a set of typing tests in
Wednesday and gave the same set of the following week.
The teacher wanted to know if the students’ typing skills
are consistent. What reliability can be used?
B. Indicate the type of validity applicable for each case. Write the type of validity on
the blank before the number.
1. The science coordinator developed a science test to determine
who among the students will be placed in an advanced science
section. The students who scored high in the science test were
selected. After two quarters, the grades of the students in the
advanced science were determined. The scores in the science
test were correlated with the science grades to check if the
science test was accurate in the selection of students. What type
of validity was used?

2. A test composed of listening comprehension, reading

comprehension, and visual comprehension items was
administered to students. The researcher determined if the
scores on each area refers to the same skill on comprehension.
The researcher hypothesized a significant and positive
relationship among these factors. What validity was established?
3. The guidance counsellor conducted an interest inventory that
measured the following factors: realistic, investigative, artistic,
scientific, enterprising, and conventional. The guidance
counsellor wanted to provide evidence that the items constructed
really belong to the factor proposed. After her analysis, the
proposed items had high factor loadings on the domain they
belong to. What validity was conducted?
4. The technology and livelihood education teacher developed a
performance task to determine student competency in preparing
a dessert. The students were tasked with selecting a dessert,
preparing the ingredients, and making the dessert in the kitchen.
The teacher developed a set of criteria to assess the dessert.
What type of validity is shown here?
5. The teacher in a robotics class taught students how to create a
program to make the arms of a robot move. The assessment
was a performance task making a program to make three kinds
of robot arm movements. The same assessment task was given
to students’ with no robotics class. The programming
performance of the two classes was compared. What validity
was established?
C. An English teacher administered a spelling test to 15 students. The

spelling test is composed of 10 items. Each item is encoded, wherein a
correct answer is marked as “1”, and the incorrect answer is marked as
“0”. The grade in English is also provided in the last column. The first five
are words with two stresses, and the next five are words with a single
stress. The recording is indicated in the table.
Your task is to determine whether the spelling test is reliable and valid
using the data to determine the following: (1) split-half, (2) Cronbach’s
alpha, (3) predictive validity with the English grade, (4) convergent validity
of between words with single and two stresses, and (5) difficulty index of
each item.
Student Item Item Item Item Item Item Item Item Item Item English
No. 1 2 3 4 5 6 7 8 9 10 grades

1 1 0 0 1 1 1 0 1 1 0 80
2 0 0 0 1 1 1 1 1 0 0 81
3 1 1 0 0 1 0 1 0 1 1 83
4 0 1 0 0 1 1 1 1 1 0 85
5 0 1 1 0 1 1 1 0 1 1 84
6 1 0 1 0 1 1 1 1 1 1 89
7 1 0 1 1 1 1 1 1 0 1 87
8 1 1 1 0 1 1 1 1 1 1 87
9 1 1 1 1 1 1 1 1 0 1 89
10 1 1 1 1 0 0 1 1 1 1 90
11 0 1 1 1 0 1 1 1 1 0 90
12 1 0 1 1 1 1 1 1 1 1 87
13 1 1 1 1 1 1 1 0 1 1 88
14 1 1 0 1 1 1 1 1 1 1 88
15 1 1 1 1 1 0 1 1 0 1 85
D. Create a short test and report its validity and reliability. Select a grade
level and subject. Choose one or two learning competencies and make at
least 10-20 items for these two learning competencies. Consult your
teacher on the items and the table of specification.
1. Have your items checked by experts if they are aligned with the
selected competencies.
2. Revise your items based on the reviews provided by the experts.
3. Make a layout of you test and administer to about 100 students.
4. Encode you data and you may use an application to compute for the
needed statistical analysis.
5. Determine the following:
 Split-half reliability
 Cronbach’s alpha
 Item difficulty and discrimination
Write a report on you procedure. The report will contain the following parts:
Introduction. Give the purpose of the study. Describe the test
measures, its component, the competencies selected, and kind of items.
Rationalize the need to determine the validity and reliability of the test.
Method. Describe the participants who took the test. Describe what the
test measures, number of items, test format, and how content validity was
established. Describe the procedure on how data was collected or how the
test was administered. Describe what statistical analysis was used.
Results. Present the results in a table and provide the necessary
interpretations. Make sure to show the results of the split-half reliability,

Cronbach’s alpha, construct validity of the items with the underlying factors,
convergent validity of the domains, and item difficulty and discrimination.
Discussion. Provide implications about the test validity and reliability.
E. Multiple Choice
Choose the letter of the correct and best answer in every item.
1. Which is a way in establishing test reliability?
A. The test is examined if free from errors and properly administered.
B. Scores in a test with different versions are correlated to test if they
are parallel.
C. The components or factors of the test contain items that are
strongly uncorrelated.
D. Two or more measures are correlated to show the same
characteristics of the examinee.
2. What is being established if items in the test are consistently answered
by the students?
A. Internal consistency C. test-retest
B. Inter-rater reliability D. split-half
3. Which type of validity was established if the components or factors of a
test are hypothesized to have a negative correlation?
A. Construct validity C. Content validity
B. Predictive validity D. Divergent validity
4. How do we determine of an item is easy or difficult?
A. An item is easy if majority of students are not able to provide the
correct answer. The item is easy if majority of the students are able
to answer correctly.
B. An item is difficult if majority of students are not able to provide the
correct answer. The item is difficult if majority of the students are
able to answer correctly.
C. An item can be determine difficult if the examinees who are high in
the test can answer more the items correctly than the examinees
who got low scores. If not, the item is easy.

D. An item can be determine easy if the examinees who are high in the
test can answer more the items correctly than the examinees who
got low scores. If not, the item is difficult.
5. Which is used when the scores of the two variables measured by a test
taken at two different times by the same participants are correlated?
A. Pearson r correlation C. Significance of the correlation
B. Linear regression D. positive and negative correlation
F. Use the rubric to rate students’ work on the previous task.

Part Very Good Good Fair Needs
Improvement
Introduction All the parts, such One of the Two of the All parts of the
as the purpose, parts is not parts are not report are not
characteristics of sufficiently sufficiently sufficiently
the measure, and explained. explained. The explained. The
rationale, are The rational rationale connection
indicated. The justifies the somehow between the
rational justifies purpose. justifies the purpose and
well the purpose of However, purpose. rationale is
the study and some details Several details difficult to
adequate details of the test are about the test follow; the
about the test is not found. are not features of the
described and indicated. test are not
supported. described well.
Method All the parts, such One of the Two of the All parts of the
as participants, test parts is not parts are not method are not
description, validity sufficiently sufficiently sufficient
and reliability, explained. explained. Two explained. Two
procedure and One part parts lack parts or more parts
analysis, are all lacks lack adequate are missing.
present. All the adequate information
parts describe information on about the data
sufficiently how the how data was gathering and
data was gathered gathered and analysis.
and analysed. analysed.
Results The tables and There is one There are two There are more
interpretation table and tables and than two tables
necessary are all interpretation interpretations and
present. All the missing. One that are interpretations
required analyses table and/or missing. Two that are
are complete and interpretation tables and missing. Three
accurately does not interpretations or more or
interpreted. have have more tables
accurate inaccurate and
content information. interpretations
have
inaccurate
information.
Discussion Implications of the Implications of Implications of Implications of

test’s validity and the test’s the test’s the test’s

reliability are well validity and validity and validity and
explained with reliability are reliability are reliability are
three or more explained with explained with not explained,
supporting reviews. two no supporting and there is no
Detailed discussion supporting review. Two of supporting
on the results of reviews. One the results for review. Three
reliability and of the results the results for or more of the
validity are for reliability the validity and validity and
provided with and validity reliability are no reliability are
explanation. are not not provided not provided
provided with with with
explanation. explanation. explanation.
G. Summarized the result of your performance in doing the culminating task using
the checklist below.
Not yet
Ready Learning Targets
ready
1. I can independently decide on the appropriate type of
□ □ validity and reliability to be used for a test.
□ □ 2. I can analyse results of the test data independently.
3. I can interpret the results from the statistical analysis of
□ □ the test.
□ □ 4. I can distinguish the use of each type of test reliability
□ □ 5. I can distinguish then use of each type of test validity.
6. I can explain the procedure on establishing test validity
□ □ and reliability.
References
Price et al. (2017). Reliability and Validity of Measurement. In Research
Method in Psychology (3rd ed.). California, USA: The Saylor
Foundation. Retrieved from
https://opentext.wsu.edu/carriecuttler/chapter/reliability-and-validity-of-
measurement/
Professional Testing, Inc. (2020). Building High Quality Examination
Programs. Retrieved from
The Graide Network, Inc. (2019). Importance of Validity and Reliability in
Classroom Assessments. Retrieved from

https://www.thegraidenetwork.com/blog-all/2018/8/1/the-two-keys-to-
CHAPTER 4
ORGANIZATION, UTILIZATION, AND COMMUNICATION OF TEST
RESULTS
Overview
As we have learned in previous lessons, tests as used to measure
learning or achievement are form of assessment. They are undertaken to
gather data about student learning. These test results can assist teachers and
the school in making informed decisions to improve curriculum and
instruction. Thus, collected information such as test scores should have to be
organized to appreciate its meaning. Usually, the use of charts and tables are
the common ways in the presentation of data. In addition, statistical measures

are also utilized to help in interpreting correctly the data.
Most often, students are interested to know, “What is my score in the
test?” Nonetheless, the more critical question is, “What does one’s score
means?” Test score interpretation is important not just for the students
concerned but also for the parents. Knowing how certain student performs
with respect to the group or other members of the class is important. Similarly,
it is significant to determine the intellectual characteristics of the students
through their scores or grades.
Moreover, a student who received an overall score in the 60 th
percentile in mathematics would place the learner in the average group. The
learner’s performance is as good or better than 60% of the students in the
group. A closer look into the sub-skill scores of the pupil can help teachers
and parents in identifying problem areas. For instance, a child may be good in
addition and subtraction but he or she may be struggling in multiplication and
division.
In some cases, assessment and grading are used interchangeably, but
they are seemingly different. One difference is that assessment focuses on
the learner. It gathers information about what the student knows and what
he/she can do. Grading is a part of evaluation because it involves judgment
made by the teacher. This chapter concludes with the grading system in the
Philippines’ K to 12 program. Other reporting systems shall likewise be
introduced and discussed. A short segment on progress monitoring is
included to provide pre-service teachers with an idea of how to track student
progress through formative assessment.
Objective
Upon completion of the chapter, the students can demonstrate their
knowledge, understanding and skills in organizing, presenting, utilizing and
communicating the test results.
Lesson 1: Organization of Test Data Using Tables and Graphs
Pre-discussion

At the end of this lesson, pre-service teachers are expected to present

in an organized manner the test collected data from existing database or
those from pilot-tested materials in any of the assessment tools implemented
in the earlier lessons. Your success in this performance task would be
determined when you can do organizing ungroup raw test results through
tables, using frequency distribution for presenting test data, describing the
characteristics of frequency polygons, histograms, bar graphs, and their
interpretation, interpreting test data presented through tables and graphs,
determining which types of tables and graphs are appropriate for given set
data, and using technology like statistical software in organizing and
interpreting test data.
What to Expect?
1. organize the raw data from a test;
2. construct a frequency distribution;
3. acquire knowledge on the basic rules in preparing tables and graphs;
4. Summarize test data using appropriate table or graph;
5. use Microsoft Excel to construct appropriate graphs for a data set;
6. interpret the graph of a frequency and cumulative frequency
distribution; and
7. characterize a frequency distribution graph in terms of skewness and
kurtosis.
Frequency Distribution
In statistics, a frequency distribution is a list, table or graph that

displays the frequency of various outcomes in a sample. Each entry in the
table contains the frequency or count of the occurrences of values within a
particular group or interval.
Here is an example of a univariate (single variable) frequency table.
The frequency of each response to a survey question is depicted.
Degree of Agreement Frequency

Strongly agree 30

Somewhat agree 15
Not sure 20
Somewhat disagree 20
Strongly disagree 15
Total 100
A different tabulation scheme aggregates values into bins such that

each bin encompasses a range of values. For example, the heights of the
students in a class could be organized into the following frequency table.
Height range of students Frequency

less than 5.0 feet 45
5.0 - 5.5 feet 35
5.5 - 6.0 feet 20
6.0 - 6.5 feet 20
Total 120
In order to make the data collected from tests and measurements

meaningful, they must be arranged and classified systematically. Therefore,
we have to organize the data in to groups or classes on the basis of certain
characteristics.
This principle of classifying data into groups is called frequency
distribution. In this process, we combine the scores into relatively small
numbers of class intervals and then indicate number of cases in each class.
Constructing a Frequency Distribution

Below are the suggested steps to draw up a frequency distribution:
Step 1:
Find out the highest score and the lowest score. Then, determine the
Range which is highest score minus lowest score.
Step 2:
Second step is to decide the number and size of the groupings to be
used. In this process, the first step is to decide the size of the class interval.
According to H.E. Garrett (1985:4), the most “commonly used grouping
intervals are 3, 5, 10 units in length.” The size should be such that number of

classes will be within 5 to 10 classes. This can be determined approximately

by dividing the range by the grouping interval tentatively chosen.
Step 3:
Prepare the class intervals. It is natural to start the intervals with their
lowest scores at multiples of the size of the intervals. For example, when the
interval is 3, it has to start with 9, 12, 15, 18, etc. Also, when the interval is 5,
it can start with 5, 10, 15, 20, etc.
The class intervals can be expressed in three different ways:
First Type:
The first types of class intervals include all scores.
For example:
 10 - 15 includes scores of 10, 11, 12, 13 and 14 but not 15
In this type of classification, the lower limit and higher limit of the each
class is repeated.
This repetition can be avoided in the following type.
Second Type:
In this type the class intervals are arranged in the following way:
 10 - 14 includes scores of 10, 11, 12, 13 and 14
Here, there is no question of confusion about the scores in the higher
and lower limits as the scores are not repeated.
Third Type:
Sometimes, we are confused about the exact limits of class intervals
because very often it is necessary the computations to work with exact limits.
A score of 10 actually includes from 9.5 to 10.5 and 11 from 10.5 to 11.5.
Thus, the interval 10 to 14 actually contains scores from 9.5 to 14.5. The
same principle holds no matter what the size of interval or where it begins in

terms of a given score. In the third type of classification we use the real lower
and upper limits.
 9.5 - 14.5
 14.5 - 19.5
 19.5 - 24.5 and so on.
Step 4:
Once we have adopted a set of class intervals, we need to list them in
their respective class intervals. Then, we have to put tallies in their proper
intervals. (See illustration in Table 1.)
Step 5:
Make a column to the right of the tallies headed “f” (frequency). Write
the total number of tallies on each class interval under column f. The sum of
the f column will be total number of cases “N”.
The next matrix contains the scores of students in mathematics.
Tabulate the scores into frequency distribution using a class interval of 5
units.
Solution:
Table 1. Frequency distribution

Cumulative Frequency Distribution

Sometimes, our concerned is with the number of percentage of values
greater than or less than a specified value. We can get this by adding
successively the individual frequencies. The new frequencies obtained by this
process, adding individual frequencies of class intervals are called cumulative
frequency. If the frequencies of individual class interval are denoted as f1, f2,
f3,… fk then the cumulative frequencies will be f1, f1 + f2, f1 + f2 + f3, f1 + f2
+ f3 + f4, and so on. An illustration of determining cumulative frequencies has
been given in the Table 2.
Table 2. Cumulative Frequency and Class Midpoint (n=60)

Class f Midpoint Cumulative Cumulative
Intervals (CI) (M) frequency percentage
> < > <
90 - 94 2 92 2 60 3% 100%
85 - 89 2 87 4 58 7% 97%
80 - 84 4 82 8 56 13% 93%
75 - 79 8 77 16 52 27% 87%
70 - 74 7 72 23 44 38% 73%
65 - 69 10 67 33 37 55% 62%
60 - 64 9 62 42 27 70% 45%
55 - 59 6 57 48 18 80% 30%
50 - 54 5 52 53 12 88% 20%

45 - 49 3 47 56 7 93% 12%
40 - 44 2 42 58 4 97% 7%
35 - 39 2 37 60 2 100% 3%
Determining the Midpoint of the Class Intervals

In a given class interval, the scores are spread over on the entire
interval. But when we want to the representative score of all the scores within
a given interval by some single value, we take mid-point as the representative
score. For example from Table 2, all 5 scores of class interval 69 to 65 are
represented by the single value 67, while 39 to 35 is represented by 37. We
can also take the same value when other two types of class intervals are
taken.
Below is the formula used to find out the mid-point.
Hence, the midpoint of 69 to 65 is:
Other class midpoints can be derived in the same way.
Graphic Representation of Data

Most of us are familiar with the saying, “A picture is worth a thousand
words.” In the same token, “a graph can be worth a hundred or a thousand
numbers.” The use of tables may not be enough to give a clear picture of the
properties of a group of test scores. If numbers presented in tables are
transformed into visual models, then the reader becomes more interested in
reading the material. Consequently, understanding of the information and
problems for discussion is facilitated. Graphs are very useful for the
comparison of test results of different groups of examinees.

The graphic method is mainly used to give a simple, permanent idea

and to emphasize the relative aspect of data. Graphic presentation is highly
desired when a fact at one time or over a period of time has to be described. It
must be stressed that tabulation of statistical data is necessary, while graphic
presentation is not. Data is plotted on a graph from a table. This means that
graphic form cannot replace tabular form of data. It can only supplement the
tabular form.
Graphic presentation has a number of advantages, some of which are
enumerated below:
1. Graphs are visual aids which give a bird’s eye view of a given set of
numerical data. They present the data in simple, readily
comprehensible form.
2. Graphs are generally more attractive, fascinating and impressive than
the set of numerical data. They are more appealing to the eye and
leave a much lasting impression on the mind as compared to the dry
and uninteresting statistical figures. Even a layman, who has no
statistics knowledge, can understand them easily.
3. They are more catching and as such are extensively used to present
statistical figures and facts in most of the exhibitions, trade or industrial
fairs, public functions, statistical reports, etc. Graphs have universal
applicability.
4. They register a meaningful impression on the mind almost before we
think. They also save a lot of time as very little effort is required to
grasp them and draw meaningful inferences from them.
5. Another advantage of graphic form of data is that they make the
principal characteristics of groups and series visible at a glance. If the
data is not presented in graphic form, the viewer will have to study the
whole details about a particular phenomenon and this takes a lot of
time. When data is presented in graphic form, we can have information
without going into many details.
6. If the relationship between two variables is to be studied, graphic form
of data is a useful device. Graphs help us in studying the relations of
one part to the other and to the whole set of data.

7. Graphic form of data is also very useful device to suggest the direction
of investigations. Investigations cannot be conducted without any
regard to the desired aim and the graphic form helps in fulfilling that
desired aim by suggesting the direction of investigations.
8. In short, graphic form of statistical data converts the complex and huge
data into a readily intelligible form and introduces an element of
simplicity in it.
Basic Rules for the Preparation of Tables and Graphs

Ideally, every table should:
1. Be self-explanatory;
2. Present values with the same number of decimal places in all its cells
(standardization);
3. Include a title informing what is being described and where, as well as
the number of observations (N) and when data were collected;
4. Have a structure formed by three horizontal lines, defining table
heading and the end of the table at its lower border;
5. Not have vertical lines at its lateral borders;
6. Provide additional information in table footer, when needed;
7. Be inserted into a document only after being mentioned in the text; and
8. Be numbered by Arabic numerals.
Similarly to tables, graphs should:

1. Include, below the figure, a title providing all relevant information;
2. Be referred to as figures in the text;
3. Identify figure axes by the variables under analysis;
4. Quote the source which provided the data, if required;
5. Demonstrate the scale being used; and
6. Be self-explanatory.
The graph's vertical axis should always start with zero. A usual type of
distortion is starting this axis with values higher than zero. Whenever it
happens, differences between variables are overestimated, as can been seen
in Figure 1.

Figure 1. Students’ Math and English Grades

Figure showing how graphs in which the Y-axis does not start with zero
tend to overestimate the differences under analysis. On the left there is a
graph whose Y axis does not start with zero and on the right a graph
reproducing the same data but with the Y axis starting with zero.
Other graphic presentations are hereby illustrated to interpret clearly
the test data.
1. Line graph (polygon)

This is also used for quantitative data, and it is one of the most
commonly used methods in presenting test scores. It is the line graph or a
frequency polygon. It is very similar to a histogram, but instead of bars, it
uses lines to compare sets of test data in the same axes.
In a frequency polygon, you have lines across the scores in the
horizontal axis. Each point in the frequency polygon represents two
numbers, which are the scores or class midpoints in the horizontal axis
and the frequency of that class interval in the vertical axis. Frequency
polygon can also be superimposed to compare several frequency
distribution, which cannot be done with histograms.
You can construct a frequency polygon manually using the
histogram in Figure 2 by following these simple steps:
a. Locate the midpoint on the top of each bar. Bear in mind that the
height of each bar represents the frequency in each class interval,
and the width of the bar is the class interval. As such, that point in
the middle of each bar is actually the midpoint of that class interval.
b. Draw a line to connect all the midpoints in consecutive order.

c. The line graph is an estimate of the frequency polygon of the test

scores.
Figure 2. Frequency Polygon
2. Cumulative Frequency Polygon
This graph is quite different from a frequency polygon because the

cumulative frequencies are plotted. In addition, you plot the point above
the exact limits of the interval. As such, a cumulative polygon gives a
picture of the number of observations that fall below or above a certain
score instead of the frequency within a class interval. In Table 2, the
cumulative frequencies (less than and greater than) are in the 4th and 5th
columns; in the 6th and 7th columns are the conversions to cumulative
percentage. A cumulative percentage polygon is more useful when there is
more than one frequency distribution with unequal number of
observations.
Thus, consider the class interval of 70-74 where cf> and cf< are 23
and 44, respectively. It means that there are 23 (or 38%) students have
scores of 70 and above, while there are 44 (or 73%) students whose
scores fall from 74 and below. (Please see illustrations in Figures 3 and 4).

Figure 3. Cumulative Frequency Polygon (cf>)
Figure 4. Cumulative Frequency Polygon (cf<)
3. Bar Graph
This graph is often used to present frequencies in categories of a
qualitative variable. It looks very similar to a histogram, constructed in the
same manner, but spaces are placed in between the consecutive bars.
The columns represent the categories and the height of each bar as in a
histogram represents the frequency. If experimental data are graphed, the
independent variables in categories is usually plotted on the x-axis, while
variable in the horizontal or x-axis is categorical, bar graphs can be
presented horizontally. Bar graphs are very useful in comparison of test
performance of groups categorized in two or more variables. Following are
some examples of bar graphs.

Figure 5. Gender Profile of BEED 3E
Figure 6. Attitudes of BEED 3E Students on Online Learning
Figure 7. Graduate Profile
4. Circle graph (Pie Chart)

One commonly used method to present categorical data is the use
of a circle graph. You have learned in basic mathematics that there are

360⁰ in a full circle. As such, the categories can be represented by the

sectors of the circle that appear like a pie; thus, the name pie graph. The
size of the pie is determined by the percentage of students who belong in
each category such as the one shown in Figures 8 and 9.
Figure 8. Gender Profile of BEED 3E
Figure 9. Attitudes of BEED 3E Students towards Online Learning
Selection of the most appropriate graph for a given set of data can be
facilitated by some computer software or applications. A common application
is the Chart Wizard in Microsoft Excel which offers an array of different charts
along with several variants.
Variations on the Shapes of Frequency Distributions

As discuss earlier, a frequency distribution is an arrangement of a set
of observations. These observations in the field of education or other sciences

are empirical data that illustrate situations in the real world. With the world
population reaching 7.6 billion, you can imagine hundreds of possible
frequency distributions representing different groups and subgroups taken
from an infinitely large population. It is reasonable to expect that there will be
variations in the shapes of frequency distributions. Researchers, scientists,
and educators have found that empirical data, when recorded, fit the following
shapes of frequency distributions.
What is skewness?
Examine the graphs below.
Figure 10. Various Graphs of Frequency Distribution

Figure 10a is labeled as normal distribution. Note that half the area of
the curve is a mirror reflection of the other half. In other words, it is a
symmetrical distribution, which is also referred to as bell-shaped distribution.
The higher frequencies are concentrated in the middle of the distribution. A
number of experiments have shown that IQ scores, height, and weight of
human beings follow a normal distribution.

The graphs of Figures 10c and 10d are asymmetrical in shape. The
degree of asymmetry of a graph is called skewness. Basic principles of a
coordinate system tell us that, as we move toward the right of the x-axis, the
numerical value increases. Likewise, as we move up y-axis, the scale value
becomes higher. Thus, in a negatively-skewed distribution, there are more
who get higher scores and the tail indicates that the lower frequencies of
distribution points to the left or to the lower scores. On the other hand, in
positively-skewed distribution, lower scores clustered on the left side. This
means that there are more who get lower scores and the tail indicates the
lower frequencies are on the right or to the higher scores.
The graph in Figure 10b is a rectangular distribution. It occurs when the
frequency of each score class interval of scores is the same or almost
comparable such that it is also called a uniform distribution.
We have differentiated the four graphs in terms of skewness, which
refers to their symmetry or asymmetry (non-symmetry). Another way of
characterizing frequency distribution is with respect the number of “peaks”
seen on the curve. Refer to the following graphs.
Figure 11. A unimodal frequency distribution

Observe that the curve has only one peak. We refer to the shape of
this distribution as unimodal. Now look at the graph below. There are two
peaks appearing at the highest frequencies.

Figure 12. A bimodal frequency distribution

We call this bimodal distribution. For those with more than two peaks,
we call these multimodal distributions. In addition, unimodal, bimodal, or
multimodal may or may not be symmetric. Look back at the negatively-
skewed and positively-skewed distributions in Figures 10c and 10d. Both have
one peak; hence, they are also unimodal distributions.
What is kurtosis?
Another way of contrasting frequency distributions is illustrated below.
Let us consider the graphs of three frequency distributions in Figure 13.
Figure 13. Frequency Distributions with Different Kurtosis

What is common among the three distributions? What differences can
you observe among the three distributions of test scores?
It is the flatness of the distribution, which is also the consequence of
how high or peaked the distribution is. This property is referred to as kurtosis.

X is the flattest distribution. It has a platykurtic (platy, meaning broad of

flat) distribution. Y is the normal distribution and it is a mesokurtic (meso,
meaning intermediate) distribution. Z is the steepest or slimmest, and is called
leptokurtic (lepto, meaning narrow) distribution.
What curve has more extreme scores than the normal distribution?
What curve has more scores that are far from the central value (or
average) than does the normal distribution?
For the meantime, the characteristics are simply described visually.
Succeeding lesson connects these visual characteristics to some important
statistical measures.
Summary
 Test data are better appreciated and communicated if they are arranged,
organized, and presented in a clear and concise manner.
 A frequency distribution is a list, table or graph that displays the frequency
of various outcomes in a sample. Each entry in the table contains the
frequency or count of the occurrences of values within a particular group
or interval.
 There are steps to follow in constructing a frequency distribution.
 Tables and graphs are common tools that help readers better understand
the test results.
 The graphic method is mainly used to give a simple, permanent idea and
to emphasize the relative aspect of data.
 Tabulation of statistical data is primarily needed over the graphic
presentation.
 Data are plotted on a graph from a table. This means that graphic form
cannot replace tabular form of data but can definitely supplement it.
 Skewness is a measure of symmetry, or more precisely, the lack of
symmetry. A distribution, or data set, is symmetric if it looks the same to
the left and right of the center point.
 Kurtosis is a measure of whether the data are heavy-tailed or light-tailed
relative to a normal distribution. Data sets with high kurtosis tend to have

heavy tails, or outliers, while data sets with low kurtosis tend to have light
tails, or lack of outliers.
Enrichment
1. Explore the Chart Wizard facility of Microsoft Excel application.
2. Read the following articles:
a. “How to Create a Chart in Excel using the Chart Wizard” from
https://www.middlesex.mass.edu/KB/Articles/Public/127/.
b. “Are the Skewness and Kurtosis Useful Statistics?” from
https://www.spcforexcel.com/knowledge/basic-statistics/are-
skewness-and-kurtosis-useful-statistics
3. Watch the following videos:
a. “MS Excel - Pie, Bar, Column & Line Chart” by Tutorials Point (India)
Ltd. (2018, January 15) from https://www.youtube.com/watch?
v=Z2gzLYaQatQ.
b. “How to Construct a Frequency Distribution Table” from
https://www.youtube.com/watch?v=j6ftiC2o6O4.
Assessment
A. Let us see how well you understood what have been presented in this
lesson.
1. Consider the table showing the results of a reading examination of set of students.
Class Midpoint F Cumulative Cumulative
Interval Frequency Percentage
140-144 142 2
135-139 137 7
130-134 132 9
125-129 127 14
120-124 122 10
115-119 117 6
110-114 112 2 2
a. What is being described in the table?

b. How many students are there in the class?
c. What is the class width?
d. How did we get the midpoints from the given class interval?
e. What is the lower limit of the class with the highest frequency?
f. What is the upper limit of the class with the lowest frequency?

g. The entry in the lowest class interval of the 4th column is done for
you. From the lower class interval, can you fill up the remaining
blanks upward? How did you do it?
h. Look at the entire column on cumulative frequency. What is the
cumulative frequency of the highest class interval? How do you
compare this cumulative frequency with the number of students
who took the test?
i. The last column is labeled cumulative percentage. What should be
the first entry at the bottom of the column? How did you determine
it? Can you fill up the entire column with the right percentage? How
do you do these in two ways? Which is the easy way?
j. Take a look at the values in the table, in particular, the frequency
column. What type of distribution (positively skewed, negatively
skewed, and symmetrical) is depicted by the given values? Why do
you say so?
k. What type of graph is most appropriate for this frequency table?
2. Analyze the figures in the succeeding pages and answer the questions
that pertain to each graph.
For Figure 15:
a. What is the shape of the frequency distributions as to symmetry?
b. What is the estimated value of the highest score in each
distribution? What does this value indicated?
c. Which section got the highest average? Which section got the
lowest?
Figure 16. Year-end Test Scores of K9 Students

For Figure 17:

a. If the center dotted line is taken as the average, how do you

compare the average of the three frequency distributions?
b. In what aspects do the three distributions differ?
c. Imagine Xs place inside each of the three curves, where X
represents a score. How do you compare the spread of the scores
in the three frequency distributions from its respective average?
d. In which section did the scores spread most?
e. Which section has scores closet to the average?

For Figure 18:
a. What are being depicted in the three frequency distributions?
b. Which section has more students who got high scores?
c. Which section has more students who got low scores?
d. If you are the teacher who handled the three sections, and
everything was held constant in handling the three classes, whose
performance will you be most happy with? Why?

3. Now, to further see how well you were able to comprehend all the topics
discussed earlier, fill in the answer to each box in the diagram below.
B. Accomplish the following activities to know the extent to which you have
understood the concepts introduced in this lesson.
1. The following aptitude test scores have been recorded in a guidance office.
140 88 115 91 96
93 117 99 101 108
98 123 119 146 107
107 111 100 125 110
83 127 116 113 104
126 114 110 114 138
109 102 113 106 90
107 91 102 103 135
104 101 131 87 124
113 135 126 112 140
a. Arrange the data in descending order.

b. Determine the range.
c. Determine the class interval for nine (9) classes.
d. What is the recommended number to start the first class interval?
e. With your responses in c and d, present in a table the scores as
grouped frequency distribution showing:
1. Actual tallying of scores;
2. Numeric frequency;
3. Midpoints of each class interval;
4. Cumulative frequency; and

5. Cumulative percentage.
f. Construct a histogram from the given scores.
g. Draw a frequency polygon superimposed in the histogram you have
done in f.
h. Using your data in e.5, draw a cumulative percentage polygon.
Figure 19 shows a graph constructed from a first quarter exam in Science
gathered from 193 STEM students; 100 are males and 93 females. Give three
statements on test performance of STEM students as depicted in the figure.
Figure 19. Cumulative Frequency Polygon of Test Scores in Science
2. A histogram of the same examination in the preceding number is shown

below.
a. Sketch a frequency polygon in the same graph. Discuss how you did it.
b. Considering that there are 193 students who took the test, mark the
area where 50% of the scores fall. Discuss this with a classmate.
c. Give three additional observations from the graph.

Figure 20. Histogram of Test Scores in Science of STEM Students

C. Use the given Self-confidence Inventory to gather the data that you need
to apply what you have learned in this lesson.
Self-Confidence Inventory
Put check mark (√) on the appropriate column that describe how you find yourself in
the following situations below. There is no right or wrong response on each item, so
feel free to express your true self. Results will be kept strictly confidential.
Statements Always Almost Sometimes Seldom Never
(5) Always (4) (3) (2) (1)
1. I feel that I have a
number of positive
qualities.
2. I feel I am a worthy
person to my family,
friends, and classmates.
3. I am inclined to think I am
a failure.
4. I have many
accomplishments as what
others of my age have
done.
5. I feel I do not have much
to be proud of my family.
6. I am happy with who I
am.
7. I feel I have not
contributed much as a
son/daughter to my
parents.
8. I feel that my classmates
are afraid to approach me
for help.
9. I am afraid to make
mistakes.
10. I am not bothered
about what people say
about me.
11. With how I am going,
the future will be bright for
me.
12. I get excited when I
try new things.
13. I cannot sleep when I
hear negative things
about me.
14. I am as important as
other people.
15. I feel depressed
when I do not succeed in
what I plan to achieve.

You are to work in a group of five to accomplish the following:

1. Administer the Self-Confidence Inventory to at least 100 respondents with
the following requirements:
a. Different characteristics/demographics (e.g., year level or age, gender,
professional background, etc.)
b. Classify the respondents in two groups only if you have 100
respondents or the most three groups if you have more than 100
respondents.
2. Get the score of each respondent using the point system below:
Always = 5 points Seldom = 2 points
Almost Always = 4 points Never = 1 point
Sometimes = 3 points
Items 3, 5, 7, 8, 9, 13, and 15 are negative statements, so

corresponding equivalent points on response to these items are
reversed.
3. Tally the scores of each group according to category you have applied to
your respondents.
4. Make a separate frequency distribution of scores arranged in class interval
for each group. You decide for yourselves the appropriate class interval.
Your table should show the following:
a) class intervals,
b) midpoint,
c) frequency,
d) cumulative frequency, and
e) cumulative percentage.
5. Make a graph of frequency polygon 1. Superimpose the graph of
frequency polygon 2. If you have 3rd group, superimpose the frequency
polygon 3. In your graphs, take note of the following: (a) scores is labeled
on the horizontal axis, and (b) frequency or percentage whichever is
appropriate on the y-axis.

6. Describe the self-confidence of the respondents in each category as

depicted in the individual graph.
7. With reference to the degree of skewness, kurtosis, and a number of peak
points of the graphs that you have constructed, makes a descriptive
comparison on the level of self-confidence between or among groups of
respondents.
8. Make a written report where sections of the report are aligned with 1-7 of
the above tasks.
D. Use the rubric below in evaluating your report on the Self-Confidence inventory.
Then let another rate group rate your work using the same rubric.
Criteria 3 points 2 Points 1 Points
I. Methodology and Response Scoring
Organization Administration of Administration of Administration of
inventory was inventory was inventory was neither
organized and organized and not organized nor
coherent with the coherent with the coherent with the
instructions given. instructions given. instructions given.
Response Scoring of respondents’ Scoring responses Scoring of responses
Scoring answers are all correct has one to two showed three or more
and accurate. error/s. errors.
II. Frequency Distribution
Accuracy of All content given are One to two error/s Three or more errors
Content correct. is/are evident. are evident.
Representation All relevant information One to two relevant Three or more
is presented correctly information is/ are relevant information
presented incorrectly. are presented
incorrectly.
III. Graphic Display of Data
Completeness The graph contains The graph contains The graph contains
complete information one missing two or more missing
(i.e., has title, labels, information information.
and legend).
Neatness The graph is very neat The graph is generally The graph is difficult to
Organization and easy to read. neat and readable. read
IV. Written Report
Content All contents are Minor irrelevances Both irrelevances
relevant to the task are present. and
misinterpretations
are present.
Explanation Explanation is clear Explanation is Explanation is difficult
and relevant to answer somewhat clear but to understand and not
the questions. relevant to the directly answering the
questions. questions.
Completion All problems and One to two of the Three or more of the
required activities are problems and problems and activities
completed. activities are not are not completed are
completed are completed
completed

E. Answer the following multiple-choice items.

1. If the lowest score in a distribution is 71 with a class interval of 5, what is
the most appropriate first class interval?
A. 71-75 C. 70-75
B. 67-75 D. 70-74
Refer to the figure below to answer the next questions:
2. What graph was used to represent the test performance of Grade 5

pupils?
A. Bar Graph
B. Horizontal graph
C. Histogram
D. Line graph
3. Which section performed twice better than the other section?
A. Section A
B. Section C
C. Section B and Section C
D. Section A and Section C
For Numbers 4-5:
The following figure gives a profile of students passing a certain
subject.

4. What period shows the highest increase of students passing the subjects?
A. 1st Quarter
B. 2nd Quarter
C. 3rd Quarter
D. 4th Quarter
5. What is the rate of increase of passing from the 2nd to 3rd quarter?
A. 75%
B. 50%
C. 33%
D. 25%
F. Supplemental Exercises
1. The following is a frequency distribution of examination marks:
Class interval f
90 – 94 6
85 – 89 9
80 – 84 7
75 – 79 13
70 – 74 14
65 – 69 19
60 – 64 11
55 – 59 11
50 – 54 9
45 – 49 8
40 – 44 8
Answer the following questions. You are free to consult your teacher
should you have concerns over these exercises.
a. What is the size of the class interval?

b. What is the exact limit of the class interval with an observed

frequency of 13? How did you determine it?
c. Without graphing, how do you see the shape of the graph? Is it
symmetrical or skewed? Is it unimodal or bimodal? Give a
statement or two to support your answer.
d. Sketch the graph of the frequency distribution using the data on the
table.
e. Confirm your answer in c. With the shape of the graph you have
drawn in d, are you correct in your thinking?
f. Create a 3rd column containing the midpoints of the class intervals.
Did you get a whole number or a decimal number?
g. Create a 4th column to indicate the cumulative frequency starting
from the lowest class interval.
h. Create a 5th column to represent the cumulative percentage.
2. Talk to a teacher/cooperating teacher or practicing student teacher in
your area of specialization. Then, request a set of test results from a
periodical examination of a class with at least 50 students. With these
results, work on the following:
a. Arrange the test scores from highest to lowest.
b. Tally the occurrence of the scores.
c. Prepare a frequency distribution and a cumulative frequency
distribution for these data using an appropriate class interval.
d. Write the exact limits and the midpoints of the class intervals for the
frequency distributions.
e. Sketch a histogram for the data you have summarized.
f. Superimpose a frequency polygon in the histogram you have
drawn.
g. Describe the graph of the frequency distribution you have done as
to its (a) symmetry and (b) modal point.
3. This work can be done with a classmate or a friend. With your partner
do the following:
a. Secure three (3) sets of test results from the same test given to
students in any grading period this year or last year. This could be
in any subject. Tell the teacher or student teacher who will allow

you to access the test about the confidentiality of information you

will access from them. Inform the source of your data that all results
will only be used for this academic undertaking, and no teacher or
student names, or even the school will be mentioned in any of the
reports.
b. Tally the scores in each section separately.
c. Make a separate frequency distribution table for each of the three
classes using the same class intervals.
d. Draw a graphical representation of test performance of three groups
all contained using the same class intervals.
e. Interpret the results in each section.
f. Make a qualitative comparison of test performance between and
among the three sections.
Educators’ inputs
I have been teaching statistics for many years at both undergraduate
and graduate programs in the College of Education. I am happy with the
illustration of statistics in the area of assessment and evaluation. It gave me
the opportunity to teach statistics in different contexts. It provided me with a
practical application of statistics in the assessment of students’ learning.
Topics in statistics like data handling can be boring to many. When I present,
for example, test score data with 100 observations, my students usually just
look at the data and wait for my next step. When I group data into a frequency
distribution table, I see my students amazed with the summarized and
condensed form of information. When I create a graph out of the frequency
table for a group data, I see students becoming more attentive with pictures
and graphs. However, drawing graphs in the traditional way with different
steps to follow and the do’s and don’ts could cause anciently for many. When
I create graphs, I usually do it through SPSS software, which has been an
indispensable tool in my statistics class. When I use this software vis-à-vis the
traditional way of computing, students are amazed. While I should be happy
when my students show interest in what they learn in my class, I am also
concerned that there is an erosion in my students’ ability to read hidden
information in a graph, which a teacher should be concerned about, being

mindful of this, I have some favorite lines that I use when teaching the use of
tables and graphs in organizing and presenting test data:
Organization of test or scores results in condensed and

compressed data in tables. Yes, you seem to be happy with
less information to deal with. Take note too that there is a
sacrifice you could lose some information which you may not
realize can be significant later. It will not be a waste of time if
you go back to the original set of data and re-examine those
that were lost due to grouping. Some original test scores from
the ungrouped distribution can be valuable in providing
additional insights about students’ learning.
It seems you prefer seeing pictures rather than tables and reading text.
But let me give you some words of caution. When you read materials with
pictures or graphs in published works on competitive achievement tests, or
other forms of advertisement and reports, be critical. A picture or a graph may
be deceiving. With some tricks, you can be misled. With some mechanical
manipulations, like compressing or expanding graphs with incorrect scaling,
you can be deceived. Do not rely completely on the visuals, examine the
underlying information and detect the missing information.
References

Miller, J. (2007). Organizing Data in Tables and Charts: Different Criteria for
Different Tasks. Teaching Statistics, Volume 29, Number 3.
Glen, Stephanie. "Frequency Distribution Table: Examples, How to Make
One" From StatisticsHowTo.com: Elementary Statistics for the rest of
us! https://www.statisticshowto.com/probability-and-
statistics/descriptive-statistics/frequency-distribution-table/
Grouped Frequency Distribution. Retrieved from
https://www.mathsisfun.com/data/frequency-distribution-grouped.html
Measures of Skewness and Kurtosis. Retrieved from
https://itl.nist.gov/div898/handbook/eda/section3/eda35b.htm

Lesson 2: Analysis, Interpretation, and Use of Test Data
Pre-discussion
Discussions in this lesson will build upon the concepts and examples
presented in the preceding lesson, which focused on the tabular and graphical
presentation and interpretation of test results. This time, other ways of
summarizing test data using descriptive statistics, which provides a more
precise means of describing a set of scores, will be introduced. The word
“measures” is commonly associated with numerical and quantitative data.
Hence, the prerequisite to understanding the concepts contained in this
lesson in your basic knowledge of mathematics, e.g., summation of values,
simple operation on integers, squaring and finding the square roots, etc.
What to Expect?
1. find the mean, median, and mode of test score distribution;
2. determine the different measures of dispersion of test scores;
3. calculate the measure of position;
4. relate standard deviation and normal distribution;
5. transform raw scores to standardized scores (z, T and stanine);
6. compute the measure of covariability using the long process and
Excel; and
7. interpret test data applying measures of central tendency, variability,
position, and covariability.
Measures of Central Tendency
The word “measure of central tendency” means the central location or

point of convergence of a set of values. Test scores have a tendency to
converge at a central value. This value is the average of the set of scores. In
other words, a measure of central tendency gives a single value that
represents a given set of scores. Three commonly-used measure of central
tendency or measures of central tendency or measures of central location are
the mean, and the mode.

Mean. This is the most preferred measure of central tendency for use
with test scores, also referred to as the arithmetic mean. The computation is
very simple. When a student has added up the examination scores he/she
made in a subject during the grading period and divided it by the number of
examinations taken, then he/she computed the arithmetic mean.
That is, = where = the mean, the sum of all the scores,
and N = the number of scores in the set.

Consider again the test scores of students given in Table 1, which is
the same set of test scores used in the previous lesson.
Table 1. Scores of 100 College Students in a Final Examination

53 30 21 42 33 41 42 45 32 58
36 51 42 49 64 46 57 35 45 51
57 38 49 54 61 36 53 48 52 49
41 58 42 43 49 51 42 50 62 60
33 43 37 57 35 33 50 42 62 49
75 66 78 52 58 45 53 40 60 33
46 45 79 33 46 43 47 37 33 64
37 36 36 46 41 43 42 47 56 62
50 53 49 39 52 52 50 37 53 40
34 43 43 57 48 43 42 42 65 35
The mean is the sum of all the scores from 53 down to the last score,
which is 35, divided by the total number of cases.
That is,
= = (53 + 36 + 57 + … + 60 + 49 + 35)/100
You have many ways computing the mean. The traditional long and
tedious computation techniques have outlasted their relevance due to
advancement of technology and the emergence of statistical software. Using
your specific calculator, you will see the symbols and . Just follow the
simple steps indicated in the guide. There are also simple steps in Microsoft
Excel. Different versions of the statistical software SPSS offer the fastest way
of obtaining the mean, even with hundreds of scores in a set. There is no loss
of original information because you are dealing with original individual scores.
The use of statistical software will be explained later.

While we organize the power of technology, there is information that is

unappreciated because of the short-hand processing of data through
mechanical computations. Look at the conventional way of presenting data in
a frequency distribution table as done in the previous lesson:
Table 2. Frequency Distribution

Class Midpoint F Xi f Cumulative Cumulative
Interval (X) Frequency (cf) Percentage
75-80 77 3 231 100 100
70-74 72 0 0 97 97
65-69 67 2 134 97 97
60-64 62 8 496 95 95
55-59 57 8 456 87 87
50-54 52 17 884 79 79
45-49 47 18 846 62 62
40-44 42 21 882 44 44
35-39 37 13 481 23 23
30-34 32 9 288 10 10
25-29 27 0 0 1 1
20-24 22 1 22 1 1
Total N = 100 f = 4720
i
In the traditional way, it cannot be argued that you can see at a glance
how the scores are distributed among the range of values in a condensed
manner. You can even estimate the average of the scores by looking at the
frequency in each class interval. In the absence of statistical program, the
mean can be computed with the following formula.
where Xi = midpoint of the class interval

f = frequency of each class interval
N = total number of scores
Thus, the mean of the test scores is computed as follows:

Median. It is the value that divides the ranked scores into halves, or the
middle value of the ranked scores. If the number of scores is odd, then there
is only one middle value that gives the median. However, if the number of
scores in the set is even number, then there are 2 middle values. But if there
are more than 50 scores, arranging the scores and finding the middle value
will take time. This formula will help you determine the median:
where Lm = exact lower limit of the median class

i = size of the class interval
N = total number of scores
fm = frequency of the median class
cfm = cumulative frequency below the median class
Applying the formula:

1. You need a column for cumulative frequency. This is now shown on the 5 th
column for data in Table 2.
2. Determine N/2 which is one-half of number of scores of examinees (that
is, 100/2 = 50).
3. Find the class interval of the 50 th score. In this case where there are 100
scores, the 50th score is in the class interval of 45-49. This class interval of
45-49 becomes the median class. We marked lines in the table to indicate
where the median class is located for easy reference when computing the
median value. (Please see illustration.)

4. Find the exact limits of the median class. In this case, class 44.5-49.5. the
lower limit then is 44.5
Summing up these steps and substituting these values to the formula,
we have:
Mode. It is the easiest measure of central tendency to obtain. The mode is the
score or value with the highest frequency in the set of scores. If the scores are
arranged in a frequency distribution, the mode is estimated as the midpoint of
the class interval which has the highest frequency. This class interval with the
highest frequency is also called the modal class. In a graphical representation
the frequency distribution, the mode is the value in the horizontal axis at which
the curve is at its highest point (peak). If there are two highest points, then,
there are two modes, as discussed in previous lesson. When all the scores in
a group have the same frequency, the group of scores has no mode.
Considering the test data in Table 2, it can be seen that highest
frequency of 21 occurred in the class interval 45 - 49. The rough estimate of
the mode is 42, which is the midpoint of the class interval.
As manual computations of the mean, median and mode are so
lengthy and tedious processes; technology makes it simpler through the use
of Microsoft Excel application. Here’s the simple guide to observe.
Average Function in Excel (Mean)
One of the most applied statistical functions in Excel is the Average.

Calculating the Average or Mean in Excel is much simpler than it was in the
school. Simply use the Average function and select the range which needs to
be averaged. In the example below, we want to know the mean of the marks
obtained by the students so we use =AVERAGE(B2:B12). Pressing ENTER
key would give us the mean of the set of scores found from cell B2 to cell

B12. Calculating the mean of scores located in several columns and rows can
also be possible provided they are all selected or defined.
Median in Excel
Median is a function which is used to find the middlemost number in a

given range of numbers. When you are finding median manually, you need to
sort the data in an ascending order but in Excel, you can simply use the
Median function and select the range and you will find your median. We take
the same example as above to find the median of marks obtained by
students. So we use =MEDIAN(B2:B12). This is also possible when treating
scores across some rows and columns.

Mode in Excel
Mode helps you to find out the value that occurs most number of times.
When you are working on a large amount of data, this function can be a lot of
help. To find the most occurring value in Excel, use the MODE function and
select the range you want to find the mode of. In our example below, we use
=MODE(B2:B12) and since 2 students have scored 55 we get the answer as
55.
In situations where there are two or more modes in your data set, the
Excel MODE function will return the lowest mode.
When are mean, median, and mode appropriately used?

To appreciate comparison of the three measures of central tendency, a
brief background of level of measurement is important. We make observations
and perform assessments on many variables related to learning - vocabulary,
spelling ability, self-concept, birth order, socio-economic status, etc. The level
of measurement helps you decide how to interpret the data as measures of
these attributes, and this serves as a guide in determining in part the kind of
descriptive statistics to apply in analyzing the test data.
Scale of Measurement
There are four levels of measurement that apply to the treatment of test
data: nominal, ordinal, interval, and ratio. In nominal measurement, the
number is used for labeling or identification purposes only. An example is the

student’s identification number or section number. In data processing, instead

of labeling gender as female, a code “1” is used to denote Female and “2” to
denote Male. While “2” is numerically greater than “1,” in this case the
difference of 1 has no meaning; it does not indicate that Male is better than
Female. The purpose is simply to differentiate or categorize the subjects by
gender.
The ordinal level of measurement is used when the values can be
ranked in some order of characteristics. The numeric values used to indicate
the difference in traits under consideration. Academic awards are made on
the basis of an order of performance: first honors, second honors, third
honors, and so on. Some assessment tools require students to rank their
interest or hobbies, or even career choices. Percentile ranks in national
assessment test or entrance examination are examples of measurement in an
ordinal scale. Percentile score becomes more useful and meaningful than
simple raw scores in university entrance or division-wide examinations.
The interval level of measurement, which has the properties of both the
nominal and ordinal scales, is attained when the values can describe the
magnitude of the differences between groups or when the intervals between
the numbers are equal. “Equal Interval” means that the distance between the
things represented by 3 and 4 is the same distance represented by 4 and 5.
The difference between the temperature 30 o and 40o is the same as that
between 90o and 100o. However, there is no true zero point. The zero degree
in the Celsius thermometer does not mean zero or absence of heat, 0 o is an
arbitrary value, a convenient starting point. With arbitrary zero point, there is a
restriction with interval data. We cannot say that an 80 o object is twice as hot
as 40o object. In the educational setting, a student who gets a score of 120 in
a reading ability test is not twice the better reader than one who got a score of
60 in the same test.
The highest level of measurement is the ratio scale. As such, it carries
the properties of the nominal, ordinal, and interval scales. Its additional
advantage is the presence of a true zero point, where zero indicates the total
absence of the trait being measured. A 0 cm as a measure of width means no
width, 0 km as a measure of distance means no distance traveled, and 0
words spelled means no words was spelled at all. Test scores as measure of

achievement in many school subjects are often treated as interval scale.

However, if achievement in a performance test in Physical Education is
measured by the number of “push-ups” one can do in a minute or distance run
in an hour; or in a Typing Class where, we count the words typed in a minute
or words spelled correctly, then these are all on ratio scale.
Now, the most likely questions that cross your mind are:
 Which measure of central tendency should I use?
 Do I have to use all the three since the statistical program can
automatically give the three measures the easiest way?
Generally, the mean is the most used measure of central tendency
because it is appropriate for interval and ratio variables, which are higher
levels of measurement. Its value is affected by the change of a single score
such that it is regarded as the most accurate measure to represent a set of
scores. In research, this is most utilized specifically when you want to make
an inference about the population characteristics on the basis of the observed
sample value.
For median, in some cases, we could have one very high score (or
very few high scores) and many low scores. This is especially true when the
test is difficult, or when students are not well prepared for the test. This will
result to many low scores and a few high scores that will lead to a positively-
skewed distribution. In the same way, when the test is too easy for students,
there will be many high scores, which lead to a negatively-skewed distribution.
In both cases, the mean can give an erroneous impression of central
tendency because its value is pulled by the extreme values that reduce its role
as the representative value of the set of scores. Hence, the median is a better
measure. It is the value that occupies the middle position among the ranked
values; thus, it is less likely to be drawn toward the direction of extreme
scores. It is an ordinal statistic but can also be used for interval or ratio data
distribution.
The mode is determined by the highest frequency of observation that
makes it a nominal statistic.

How do measures of central tendency determine skewness?

The mean, median, and mode may further be compared if they have
been calculated from the same frequency distribution. In a perfectly
symmetrical distribution, the mean, median, and mode have the same value,
and the value of the median is between the mean and the mode. This shape
is illustrated in Figure 1.
Figure 1. Mean, Median, and Mode in a Symmetrical Distribution
When the distribution becomes positively-skewed as shown in Figure

2, there are variations in their values. The mode stays at the peak of the curve
and its value will be smallest. The mean will be pulled out from the peak of the
distribution toward the direction of the few high scores. Thus, the mean gets
the largest value. The median is between the mode and the mean.
Figure 2. Mean, Median, and Mode in a Positively-Skewed Distribution

On the other hand, in a negatively-skewed distribution, as shown in the

graph below, the mode remains at the peak of the curve, but it will have the
largest value. The mean will have the smallest value as influenced by the
extremely low scores, and the median still lies between the mode and the
mean.
Figure 3. Mean, Median, and Mode in a Negatively-Skewed Distribution
What are measures of dispersion?

One important descriptive statistic in the area of assessment is the
measures of dispersion, which indicates “variability,” “spread,” or “scatter.”
(Please see Figure 4).
Figure 4. Measures of Variability of Sets of Test Scores
You can see that different distributions may be symmetrical, may have
same average values (mean, median, mode), but how the scores in each
distribution are spread out around these measures are different. In A, as

shown in Figure 4, scores range between 40 and 60; in B, between 30 and

70; and in C between about 20 and 80. Measures of variability give us the
estimate to determine how the scores are compressed, which contributes to
the “flatness” or “peakedness” of the distribution.
There are several indices of variability, and the most commonly used in
the area of assessment are the following.
Range. It is the difference between the highest (X H) and the lowest (XL)
scores in a distribution. It is the simplest measure of variability but also
considered as the least accurate measure of dispersion because its value is
determined by just two scores in the group. It does not take into consideration
the spread of all scores; its value simply depends on the highest and lowest
scores. Its value could be drastically changed by a single value. Consider the
following examples:
 Determine the range for the following scores: 9, 9, 9, 12, 12, 13, 15, 15,
17, 17, 18, 18, 20, 20, 20.
Range = XH - XL
= 20 – 9
Range= 11
Now, replace a high score in one of the scores, say, the last score and
make it 50. The range becomes:
Range = XH - XL
= 50 – 9
= 41
We observed that with just a single score, the range increased so high.
This can be interpreted as large dispersion of test scores; however, when you
look at the individual scores, it is not.
Variance and Standard Deviation. Standard deviation is the most widely

used measure of variability and is considered as the most accurate to
represent the deviations of individual scores from the mean values in the
distribution.
Let us examine the following test score distribution:

Class A Class B Class C
22 16 12
18 15 12
16 15 12
14 14 12
12 12 12
11 11 12
9 11 12
7 9 12
6 9 12
5 8 12
Note that while the distributions contain different scores, they have the
same mean. If we ask how each mean represents the score in their
respective distribution, there will be no doubt with the mean of distribution C
because each score in the distribution is 12. How about in distributions A and
B? For these two distributions, the mean of 12 is a better estimate of the
scores in distribution B than in distribution A. We can see that no score in B is
more than 4 points away from mean of 12. However, in distribution A, half of
the 12 scores is 4 points or more away from the mean. We can also say that
there is less variability of scores in B than in A. However, we cannot just
determine which distribution is dispersed or not by merely looking at the
numbers, especially when there are many scores. We need a reliable index of
variability, such as variance or standard deviation that takes into consideration
all the scores.
Recall that ∑(X- ) is the sum of the deviation scores from the mean,
which is equal to zero. As such, we square each deviation score, then sum up
all the squared deviation scores, and divide it by the number of cases, this
yields the variance. Extracting its square root gives us the standard
deviation.

The measure is generally defined by the formula:
where = population variance
= population mean
X = score in the distribution

Finding the square gives us this formula for the standard deviation.
That is,
where = population standard deviation
= population mean
X = score in the distribution

If we are dealing with the sample data and we wish to calculate an
estimate of , the following formula is used for such statistic:
where: s = standard deviation

X = raw score
= mean score
N = number of scores in the distribution

This formula is what statisticians termed as an “unbiased” estimate,
and more often preferred considering that in both research and assessment
studies, we deal on sample data rather than actual population data.
With the standard deviation, you can also see the differences between
two or more distributions. Using the scores in Class A and Class B in the
above data set, we can apply the formula:

Class A Class B
X (X – ) (X – 2
) X (X – ) (X – )2
22 22-12 100 16 16-12 16
18 18-12 36 15 15-12 9
16 16-12 16 15 15-12 9
14 14-12 4 14 14-12 4
12 12-12 0 12 12-12 0
11 11-12 1 11 11-12 1
9 9-12 9 11 11-12 1
7 7-12 25 9 9-12 9
6 6-12 36 9 9-12 9
5 5-12 49 8 8-12 16
= 12 ∑(X– )2 = 276 = 12 ∑(X- )2 = 74
The values 276 and 74 are the sum of the squared deviations of scores
in Class A and Class B, respectively. If these are divided by number of scores
in each class, this gives the variance (S2):
The values above are both in squared units, while our original units of
scores are not in squared units. When we find their square roots, we obtained
values that are on the same scale of units as the original set of scores. These
too give the respective standard deviation (S) of each class and computed
as follows:
Computing the population or sample standard deviation using the

manual or traditional means is so complicated mainly if one is careless. Again,
the calculation of these measures becomes easier and accurate with the use
of Excel. Consider the scores in Classes A and B:

When you are finding the variance using Excel, you can simply use the
VAR function and select the range and you will find the desired variance. We
take the data in Class A to find the variance of the scores obtained by
students. So we use =VAR(A2:A11). In the case of Class B, the same function
is used but the data in the cells from B2 to B 11 are considered. Thus, we use
=VAR(B2:B11).
To compute the standard deviation in Excel, similar process will be

observed. Let us consider the same data to solve for SA and SB. The STDEV
function is used to calculate the standard deviation; hence, =STDEV(A2:A11)
and =STDEV(B2:B11) are applied to Class A and B scores, respectively, as
shown. (Note: Using the lowercase “var” and “stdev” formats has no effect on
the results.)

In both instances, the Excel values match with the results using the
manual process. For larger number of scores in a distribution, Microsoft Excel
or other software is more appropriate and efficient in obtaining the variance
and standard deviation. This can be done in few seconds if you have already
entered and saved the data used to get the measure of central tendency. In
addition, the VAR and STDEV functions can still be used even if scores are
encoded in several columns and rows as shown in the next illustration.

In additional, since the standard deviation is a measure of desperation,

it means that a large standard deviation indicates greater score variability
than a lower standard deviation. If the standard deviation is small, the scores
are closely clustered around the mean, or the graph of the distribution is
compressed even if it is symmetrical or skewed.
Figure 5. Homogenous Test Score Distribution in Different Skewness
Standard Deviation and Skewness

In the preceding discussion, we determined the skewness of a
distribution only graphically. The standard deviation provides an application
to determine the degree of skewness of test scores through a coefficient of
skewness formula:
where Sk = skewness
= Mean
Mdn = median
SD = standard deviation
Let us recall our past discussion on how the relationships of measures

of central tendency are related to symmetry and skewness of the distribution.
The above formula further tells us that if the mean and the median are equal,
Sk will be equal to 0; therefore, the distribution will be symmetrical. As the

difference between mean and median moves farther from 0, the coefficient of
skewness changes to either lower or higher values.
Measures of Position
While measures of central tendency and measures of dispersion are
used often in assessment, there are other methods of describing data
distributions such as using measures of position or location. It is about the
score’s position in the distribution. What are these measures?
Quartile. In our discussion about the measure of central tendency, we

learned that median of a distribution divides the data into two equal groups.
In a similar way, the quartiles are the three values that divide the set of
scores into four equal parts, with one-fourth of the data in each part. This
means about 25% of the data falls at or below the first quartile (Q1); 50% of
the data falls at or below the 2nd quartile (Q2), and 75% falls at or below 3rd
quartile (Q3). Notice that Q2 is also the median. We can also say that Q1 is
the median of the first half of the values, and Q3 the median of the second
half of the values. Thus, the upper quartile represents on average the mark
of the top half of the class, while the lower quartile represents that the bottom
half of the class.
Quartiles are also used as a measure of the spread of data in the
interquartile range (IQR), which is simply the difference between the third
and first quartiles (Q3 – Q1). Half of this gives the semi-interquartile range
or quartile deviation (Q).
Figure 6. Quartiles

The following example illustrates these measures.

 Given the following scores, find the 1st quartile, 3rd quartile, and quartile
deviation.
90, 85, 85, 86, 100, 105, 109, 110, 88, 105, 100, 112
Steps:
1. Arrange the scores in the decreasing order.
2. From the bottom, find the points below which 25% of the score value
and 75% of the score values fall.
3. Find the average of the two scores in each of these points to determine
Q1 and Q3, respectively.
Find Q using the formula:
Note that in the above example, the upper and lower 50% contains
even center values, so the median in each half is the average of the two
center values. Consequently, applying the formula gives the quartile
deviation. That is,
Q= = 10

Decile. It divides the distribution into 10 equal parts. There are 9 deciles such
that 10% of the distribution are equal or less than decile 1, (D 1), 20% of the
scores are equal or less than decile 2 (D 2); and so on. A student whose mark
is below the first decile is said to belong in decile 1. A student whose mark is
above the 9th decile belongs to decile 10. If there are a small number of data
values, decile is not appropriate to use.
Percentile. It divides the distribution into one hundred equal parts. In the
same manner, for percentiles, there are 99 percentiles such that 1% of the
scores are less than the first percentile, 2% of the scores are less than the
second percentile, and so on. For example, if you scored 95 in a 100-item
test, and your percentile rank is 99th, then this means that 99% of those who
took the test performed lower than you. This also means that you belong to
the top 1% of those who took the test. In many cases, percentiles are wrongly
interpreted as percentage score. For example, 75% as a percentage score
means you get 75 items correct out of a hundred items, which is a mark or
grade reflecting performance level. But percentile is a measure of position
such that 75th percentile as your mark means 75% of the students who took
the test got lower score than you, or you score is located at the upper 25% of
the class who took the same test. For every large data set, percentile is
appropriate to use for accuracy. This is one reason why percentiles are
commonly used in national assessments or university entrance examinations
with large dataset or scores in thousands.
What is Coefficient of Variation as a measure of relative dispersion?

We should note that the range, semi-inter quartile range or quartile
deviation, and standard deviations are expressed in the units of original
scores. Thus, they are measures of absolute dispersion. For instance, one
distribution of test scores in mathematics may have a standard deviation of
10, and another distribution of scores in science may have a standard
deviation of 5. If we want to compare the variability of the two distributions,
can we say that the distribution with standard deviation of 10 has twice the
variability of the one with standard deviation of 5? Consider another example:
one distribution has a standard deviation of 8 meters, while another has a

standard of P15.00. Can we say that the latter distribution is more spread than
the former? Or can we compare standard deviations in meter and pesos? The
answer seems obvious. We cannot conclude anything by direct comparison of
measures of absolute dispersion because they are of different units or
different categories. In the first example, one is the distribution of mathematics
scores while the other is the distribution of science scores. To make the
comparison logical, we need a measure of relative dispersion is also known
as the coefficient of variation (CV). This is simply the ratio of the standard
deviation of a distribution and the mean of the distribution. That is, .
The above formula indicates the coefficient of variation is a percentage

value.
Thus, if the mean score of the students in mathematics is 40 with a
standard deviation of 10, then the coefficient of variation is computed as
CVmath = (100)
= 0.25
= 25%.
Suppose the mean score of students in science is 18 with standard
deviation of 5. The coefficient of variation is:
CVsci = (100)
= 0.277
= 28%.
Looking at 10 and 5 as the standard deviation of mathematics and
science scores, respectively, this may lead you to judge that the set of scores
in mathematics is twice more dispersed than scores in science. From the
computed coefficient of variations as measure of relative dispersion, we can
clearly see that the scores in mathematics are more homogenous than the
scores in science.
How is standard deviation applied in a normal distribution?

The standard deviation is viewed as the most useful measure of

variability because in many distributions of scores, not only is assessment but
in research as well, it approximates the percentage of scores that lie within
one, two, or three standard deviations from the mean. To better appreciate
this point requires an understanding of the normal.
The Normal Distribution

The normal distribution is a special kind of symmetrical distribution that
is most frequently used to compare scores. It has been found that when a
frequency polygon for a large distribution of scores of a natural phenomenon
and occurring characteristics (IQ, Height, income, test scores, etc.) are drawn
as a smooth curve, one curve stands out, which is the bell-shaped curve. As
seen below. This curve has a small percentage of observations on both tails,
and bigger percentage has a small percentage of observations on both tails,
and the bigger percentage on the inner part of the curve. The shape of this
particular curve is known as the normal curve, hence the name normal
distribution.
Figure 7. The Normal Curve
It is also called the Gaussian distribution. The standard deviation

becomes more useful because it is used to determine the percentage of
scores that all within a certain number of standard deviations from the mean.
As a result of many experiments, empirical rules have been established
pertaining to the areas under the normal curve. In assessment, the area in the
curve refers to the number of scores that fall within a specific standard

deviation from the mean score. In other words, each potion under the curve
contains a fixed percentage of cases as follows:
 68% of the scores fall between one standard deviations bellow and
above the mean
 95% of the scores fall between two standards deviations below and
above the mean
 99.77% of the scores fall between three standards deviations below
and above the mean
Figure 8 illustrates the theoretical model.
Figure 8. The Areas under the Normal Curve
From the above figures, we can state the properties of the normal
distribution:
1. The mean, median, and more are all equal
2. The curve is symmetrical. As such, the value is a specific area on the
left is equal to the value of its corresponding area on the right.
3. The curve changes from concave to convex and approaches the X-
axis, but the tails do not touch the horizontal axis.
4. The total area on the curve is equal to 1.

Standard Scores
In the preceding sections, we discussed raw scores, which are the
original scores collected from an actual testing activity. However, there are
situations where computing measures from raw scores may not be enough.
Consider a situation where we, as a student, want to know in what subjects
we performed best and poorest to determine where you need to exert. Or
maybe in the past, you took entrance examinations in more than one
university and asked yourself in which university you performed best. In case
like these we cannot find the answer by merely relying on a single score.
More concretely, if you get a score of 86 in English and 90 in Physics, you
cannot conclude that you perform better in Physics than in English. This is
ridiculous because 90 is higher than 86. Say, you later learned that the mean
score of the class in English was 80, and in Physics, the mean score like 86
or 90 is not meaningful unless it compared with other test scores. In particular,
a score can be interpreted more meaningfully if we know the mean and
variability of the other scores where that single score belongs. Knowing this, a
raw score can be converted into standard scores.
Z-score. There are many kinds of standard scores. The most useful is the z-
score, which is often used to express a raw score relation to the mean and
standard deviation. This relationship is expressed in the following formula:
Let us recall that X – is a deviation score. With this difference, we are
able to know whether your test score, say, X is above or below the average
score. However, you cannot say whether your test score or grade is better or
worse than the average score. Again, the importance of knowing the standard
deviation is highlighted here. The standard deviation helps you locate the
relative position of the score in a distribution. The equation above gives you
the z-score, which can indicate the number of standard deviations the score
is above or below the mean. A z-score is called a standard score, simply
because it is a deviation score expressed in standard deviation units.

If raw scores are expressed as z-scores, we can see their relative

position in their respective distribution. Moreover, if the raw scores are already
converted into standard scores, we can now compare the two scores even
when these scores come from different distributions or when scores are
measure two different things, like knowledge in Physics or English. The
following figures illustrate this point.
Figure 9. A Comparison of Score Distribution with Different

Means and Standard Deviations
In Figure 9, a score of 86 in English indicates better performance than

a score of 90 in Physics. Let us suppose that standard deviations in English
and Physics are 3 and 2, respectively. Thus, we can express the z-scores as:
From the above, if 86 and 90 are your scores in the two subjects, you
can confidently say that, compared with the rest of your class, you performed
better in English than in Physics. That is because in English, your
performance is 2 standard deviations above the mean, while in Physics, you
are 2.5 standard deviations below the mean. While 90 is numerically higher,
this score is more than half standard deviation below the average
performance of the class where you belong, while 86 is above the mean and
even 2 standard deviations above it. Having been transformed in the same
scale, the two graphs can be taken as one standard distribution of score.

Figure 10. Different Raw in one Z-score Distribution

We may want to look deeper into the scores. Note that in Figure 9, the
shaded areas in the two graphs indicate the proportion of scores below yours.
This is the same as saying that this proportion is the number of students in
your class who scored lower than you. But to be more specific on this
proportion, you need to look back at Figure 8 and the converted z-scores.
Examining the areas under the normal curve, we can say that about 98% in
the class scored below you in English, while only 1.2% scored below you in
Physics. We assume here that the scores are normally distributed. So, in what
subject makes you feel better?
T-Score. As you can see in the computation of the z-score, it can give us a
negative number, which simply means the score is below the mean. However,
communicating negative z-score as below the mean may not be
understandable to others. We will not even say to students that they got a
negative z-score. Also, a z-score may also be a repeating or non-repeating
decimal, which may not be comfortable for others. One option is to convert a
z-score into a T-score, which is a transformed standard score. To do this,
there is calling in which a mean of 0 in a z-score is transformed into mean of
50, and the standard deviation in a z-score is multiplied by 10. The
corresponding equation is:
T-score = 50 + 10z
For example, a z-score of -2 is equivalent to a T-score of 30. That is:
T-score = 50 + 10 (-2)
= 50 - 20
= 30

Looking back at the English score of 86, which resulted in a z-score of

2 as shown in Figure 9, T-score equivalent then is:
T-score = 50 + 10 (2)
= 50 + 20
= 70
Examining further Figures 9 and 10, where z-scores range from -4 to

+4, more than 99% of the scores are between 20 and 80.
Stanine Scores. Another standard score is stanine, shortened from standard

nine. With nine in its name, the scores are on a nine-point scale. In a z-score
distribution, the mean is 0, and the standard deviation is 2. Each stanine is
one-half standard deviation-wide. Like the T-score, stanine score can be
calculated from the z-score by multiplying the z-score by 2 and adding 5. That
is,
Stanine = 2z + 5
Going back to our example on a score of 86 in English, that is

equivalent to a z-score of 2 (Figure 10), its stanine equivalent is
Stanine = 2(2) + 5
= 9.
Example:
Scores in stanine scale have some limitations. Since they are in a 9-
point scale and expressed as a whole number, they are not precise. Different
z-scores or T-scores may have stanine score equivalent.
Z-score T-score Stanine

2.1 71 9
2.0 70 9
1.9 69 9
On the assumption that stanine scores are normally distributed, the

percentage of cases in each band or range of scores in the scales is as
follows:

With the above percentage distribution of scores in each stanine, you can
directly convert a set of raw scores into stanine scores. Simply arrange the
raw scores from lowest to highest, and with the percentage of scores in each
stanine, we can directly assign the appropriate stanine score in each raw
score. On interpretation of stanine score, let us say Kate has a stanine score
of 2. We can see that her score is somewhere at the low or bottom 7 percent
of the scores. In the same way, if John’s score is the 6 th stanine, it falls
between the 60th and 77th percentile, simply because 60 percent of the scores
are below the 6th stanine and 23 percent of the scores are above the 6 th
stanine. For qualitative description, stanine scores of 1, 2, and 3 are
considered as below average; 4, 5, and 6 are average, and 7, 8, and 9 are
above average. Thus, we can say that your score of 86 in English is above
average. Similarly, Kate’s score is below average while that of John is
average. Figure 11 illustrates the equivalence of the different commonly-used
standard scores.
Figure 11. The Normal Distribution and the Standard Score

Measures of Covariability

There are situations when we look at examinees’ performance

measures, and ask ourselves what could explain such scores. Measures of
covariablity tell us to a certain extent the relationship between two tests or
factors. Admittedly, a score one gets may not only be due to a single factor
but with other factors directly or indirectly observable which are also related to
one another. This section will be limited to introducing two scores that are
hypothesized to be related to one another.
When we are interested in finding the degree of relationship between
two scores, we are dealing with the correlation between two variables. The
statistical measure is the correlation coefficient, an index number that
ranges from -1.0 to +1.0. The value -1.0 indicates a negative perfect
correlation, 0.00 no correlation at all, and 1.00 a perfect positive correlation.
There have been many correlation studies conducted in the field of
assessment and research, but correlation coefficient values are either closer
to +1.0 or -1.0.
The various types of relationships are illustrated in the following
scatter-plot diagrams:
Figure 11. Various Types of Relationships
Some examples of interpreting Correlation coefficients are as follows:

To further analyze the relationship between two scores, we need a

statistical test to determine the correlation coefficient of two variables. In
actual practice, we deal with test scores, so the dataset is at interval and ratio
level of measurements. Hence, the commonly used statistical analysis is the
Pearson Product-Moment Correlation or Pearson r. The formula for this has
been earlier discussed and used when determining the reliability of test
through test-retest method and parallel forms.
Use of CORREL and PEARSON Functions

In this section, let us compute the correlation coefficient using Excel so
as avoid redundancy of lesson. Considering the data set in Table 3, we will
illustrate how the software works.
Table 3. Scores in Reading and Problem Solving Performance

X Y
Reading Problem Solving
4 6
9 11
4 5
11 10
12 8
5 5
7 10
5 8
6 9
2 5
First, we will encode the two (2) sets of raw data representing X and Y
in either two columns or rows in Excel as shown in the next illustration. At any
free cell, type =CORREL(A3:A12,B3:B12) as reflected in the worksheet. The
A3:A12 are the cell addresses of the first variable X, while B3:B12 are the cell

addresses of the second variable Y. Take note that the cell ranges of the 2
variables are separated by a comma.
Another way of finding the correlation coefficient in Excel is using

PEARSON function such as the one below, that is, =PEARSON(I1:I12,I2:R2).
As cited earlier, the correlation coefficient can be calculated even when data
are in array by column or row.
In contrast, the manual process will give us this complex process and
the same result.

Obviously, the mathematical processes gave a correlation coefficient of

0.705 between performance scores in Reading and Problem-Solving. This
coefficient indicates a moderate correlation between performance in reading
and problem-solving.
There are some precautions to be observed with regard to the
computed r as correlation coefficient. First, it should not be interpreted as
percent. Thus, 0.705 should not be interpreted as 70%. If we want to extend
the meaning of 0.705, we need to compute the value for r2, which becomes
the coefficient of determination. This coefficient explains the percentage of
the variance in one variable that is associated with the other variable.
Remember how variance and standard deviation were explained in the
precious sections?
Now, with reference to the two variables indicated in Table 3, and the
computed r of 0.705, it results to an r2 of 0.49, which can be taken as 49%.
This is interpreted as 49% of the variance which is equal to 1 or 100%. If 49%
of the variance observed in problem-solving scores is attributable to reading
scores, then the other 51% of the variance is due to other factors. This
concept is depicted in Figure 12. Second, while the correlation coefficient
shows the relationship between two variables, it should not be interpreted as
causation. Considering our examples, we could not say that the scores in
reading test cause 49% of the variance of the problem-solving test scores.
Relationship is different from causation.
Figure 13. Covariation of Performance in Reading and Problem-

Solving

The example in Table 3 presented only the computation of 10

observations. In actual class setting, we deal with bigger numbers of scores,
especially as classes become larger in public schools. As such, formulas (a)
and (b), as stated above, will be difficult to use, even with the scientific
calculator. Hence, statistical software as Excel and others will help us out in
finding the correlation coefficient.
Besides, if we want to supplement our analysis with a scatterplot
diagram, it is equally easy. With Excel application, here are few of the general
steps regardless of the version.
1. Encode the 2 sets of raw scores either in row or column.
2. Select the 2 columns or rows using the pointer.
3. Click the Insert command and select Chart.
4. From the options, choose XY SCATTER then click the chosen variant.
5. Once the Scatterplot graph is displayed, choose the preferred Layout.
6. Edit the title, axis labels, legend, gridlines, colors, fonts, etc. according
to your needs.
7. Finally, you get the desired graph as shown in Figure 14.
Figure 14. Correlation between Performance in Reading and

Problem Solving
Summary
 Measures of central tendency include the mean, median and mode.
 Mean is the average of given sets of scores. You should add the
numbers up then divide by the number of the cases.

 Median is the number in the middle when you order the numbers in an
ascending order. If there are two numbers in the middle, you should
take the average of those two numbers.
 Mode is the number which is repeated the most in the set.
 Measures of dispersion include the range, interquartile range, semi-
interquartile range, quartile deviation, variance and standard deviation.
 The range of a dataset is the difference between the largest and
smallest values in that dataset.
 The interquartile range is the middle half of the data that is in between
the upper and lower quartiles. Dividing this by 2 gives us the quartile
deviation.
 Variance is the average squared difference of the values from the
mean. Unlike the previous measures of variability, the variance
includes all values in the calculation by comparing each value to the
mean.
 The standard deviation is the standard or typical difference between
each data point and the mean.
 Measures of location can be categorized as percentile, decile and
quartile.
 A normal distribution is sometimes called the bell curve because its
distribution occurs naturally in many situations. The bell curve is
symmetrical where half of the data falls to the left of the mean and
another half falls to the right.
 Standardized scores can be a z-score, T-score or stanine.
 Measures of covariablity tell us to a certain extent the relationship
between two tests or factors.
Assessment
A. Answer the following questions orally.
1. What are the measures of central tendency?
2. What are the measures of dispersion?
3. What are the measures of position?
4. What is covariability?

5. Differentiate the different ways of converting raw scores into standard

scores?
B. Given the following test scores in Math: 5, 20, 13, 15, 12, 19, 20, 10, 17,
11& 16.
1. What is the mean? How did you find it?
2. What is the median? Why did you say it is the median?
3. What is the mode? How did you determine this measure?
C. Examine the following frequency polygons below:
1. Which distribution has the highest mean, X, Y or Z? What is the

estimated mean of each distribution?
2. Which distribution is symmetrical? Which is asymmetrical? Why?
3. In which distribution are the scores most dispersed? What range of
scores is indicated by this distribution?
4. Which distribution has the smallest standard deviation? Why did you
say so? How do you describe this distribution?
D. Let me see further how you understand the earlier discussion by
considering the following class scenario:
Ms. Dioneza’s best students in Spelling obtained the following scores in a
competitive test:
89, 90, 91, 92, 93, 94, 95, 95, 96, 98
Dr. Protacio’s best students in Spelling got the following scores in the
same competitive test:
60, 90, 92, 93, 93, 94, 96, 96, 97, 98

1. Which measure of central tendency is most appropriate for the

students of Ms. Dioneza? Why did you say so? Then, what is the value
of that measure?
2. Which measure of central tendency is most appropriate for the
students of Dr. Protacio? Why? Then, what is the value of that
measure?
3. For each group, find the following:
a. Range
b. Q1, Q2 & Q3
c. Semi-interquartile range
d. Variance
e. Standard deviation
E. Determine the type of distribution depicted by the following measures:
1. = 80.45 Mdn = 80.78 Mode = 80.25
2. = 120 Mdn = 130 Mode = 150
3. = 89.78 Mdn = 82.16 Mode = 82.10
F. The following is frequency distribution of scores of 10 persons

X f
30 1
28 2
20 3
15 1
13 2
10 1
1. What is the mean of the distribution?

2. What is the median?
3. What is the mode?
4. What is the range?
5. What is the standard deviation?

G. After all what you have done, can you now identify the elements that are
implied in the empty boxes below?
H. Refer to the figure below as the frequency polygons representing entrance

test scores of three groups of students in different programs.
1. What is the mean score of Education students?
2. What is the mean score of Engineering students?
3. What is the mean score of Business students?
4. Which group of students has the most dispersed scores in the test?
Why?

5. What distribution is symmetrical? What distribution is skewed? Why?

I. The following is a frequency distribution of year-end examination marks in a
certain secondary school.
Class Interval f
60 - 65 2
55 - 59 5
50 - 54 6
45 - 49 8
40 - 44 11
35 - 39 10
30 - 34 11
25 - 29 20
20 - 24 17
15 - 19 6
10 - 14 4
N = 100
1. Compute the mean, median, and mode of the frequency distribution.

2. Compare the measure of average in (a) from this comparison, is the
distribution positively skewed, negatively skewed, or symmetrical?
3. Make a sketch of the graph of the frequency distribution. Describe the
graph of the distribution as to its skewness.
4. Find:
a. Third quartile or the 75th percentile (P75)
b. First quartile or the 25th percentile (P25)
c. Semi-interquartile range
J. A common exit examination is given to 400 students in a university. The
scores are normally distributed and the mean is 78 with a standard
deviation of 6. Paul had a score of 72 and Vic a score of 84. What are the
corresponding z-score of Paul and Vic? How many students would be
expected to score between the scores of Paul and Vic? Explain your
answer.
K. Jayson obtained a score of 40 in his Mathematics test and 34 in his
Reading test. The class mean score in Mathematics is 45 with a standard
deviation of 4 while in Reading, the mean score is 50 with a standard
deviation of 7. On which test did Jayson do better compared to the rest of
the class? Explain your work.

L. Following are sets of scores on two variables: X for Reading

Comprehension and Y for Reasoning Skills administered to sample of
students
X 11 9 15 7 5 9 8 4 8 11
Y 13 8 14 9 8 7 7 5 10 12
1. Compute the Pearson Product-Moment Correlation for the above data.

2. Describe the direction and strength of the relationship between reading
comprehension and reasoning skills
3. Compute the coefficient of determination. Interpret the results.
M. Your learning in this chapter will be more meaningful if you will be
engaged in actual situations beyond the classroom. The examples that
have been presented are basically hypothetical for the purpose of
concretizing the concepts and principles intended to be learned. This time,
the following activities will require your engagement in gathering of
empirical data that are experiential and will depict the true picture in the
field of education.
Activity 1
Identify yourself in a group of four classmates. Here is the task you need to
work your team members.
 Secure a set of old test papers that have been scored by a teacher or a
student teacher. It is advised that number of cases be at least 100. It is
understood that the 100 scores came from the same test. See to it that
you observe confidentially of the documents you have requested. No
name should be identified in your written work, but use codes to
identify the observations or cases.
 Here are the tasks:
1. Prepare a data set for the test scores.
2. With the aid of your scientific calculator, or Microsoft Office Excel,
find the following:
a. Mean
b. Median

c. Mode
3. Describe the type of distribution of test scores.
4. Which measure of central tendency is most appropriate to describe
the distribution? Why do you say so? Explain.
5. Select three (3) students in the list. Describe the test performance
of each student relative to the performance of all the students in the
whole class.
Activity 2
Interview a teacher on how she decides to pass or fail a student. Report the
specific questions that you have asked and the corresponding responses you
have captured in this interview activity. Your analysis and presentation should
reflect the application of the measures of central tendency, measures of
variability, and of the measures of location.
Activity 3
With your team members, make a visit to any of the following offices of a
school:
1. Office of the Guidance Councilor
2. Testing Center
3. Office of Student Affairs
Request for any of the following data which you can avail of:
1. IQ test scores
2. Aptitude test scores
3. Admission test scores
4. Qualifying examination scores
5. Any psychological test scores
It is advised that the number of observations should at least be 50. The
larger the number of cases would be better.
Please emphasize the data will be held in full confidentiality. They have
the option not to give the names of the examinees as ethical consideration.
Do not apply coding technique to label the different observation.
From the test scores that you have gathered, do the following:
1. Make a frequency distribution of the test scores.

2. Find the following measures: mean, median, and mode.

3. Describe the type of distribution.
4. Determine the most appropriate measure of central tendency to
describe the average IQ/Aptitude/Performance of the examinees.
5. For those who gather IQ test scores, interview the personnel in-charge
on how they interpret the numerical value of IQ test scores (as above
average, average, etc. whatever applies).
6. Find the standard deviation.
7. Consider at least 5 student scores in you data set and try computing
the ff:
a. z-scores
b. T-scores
c. Percentile equivalent
d. Stanine scores
8. Make a report you your class on the following:
a. The methods and procedures in the test data collection
b. Summary of the data set to your class using tables, figures, and
graphs
c. Summary of your findings on question 1-7 above.
N. Test Yourself.
In each item, choose the letter which you
think can best represent the answer to
given problem situation. Give a statement
to justify your choice.
1. Which distribution is negatively skewed
in the figure shown?
A. Distribution X
B. Distribution Y
C. Distribution Z
D. Distribution W
2. What is the preferred measure of central tendency in a test distribution
where there are small number of scores that are extremely high or low?
A. Median C. Variance
B. Mean D. Mode
3. The following scores were obtained from a short spelling test:
10, 8, 11, 13, 12, 13, 8, 16, 11, 8, 7, 9.
What is the modal score?
A. 8 C. 11

B. 10 D. 13
4. What does it mean when a student got a score at the 70 th percentile on
a test?
a. The performance of the student is above average
b. The student answered 70 percent of the items correctly
c. The student got at the least 70 percent of the correct answers
d. The student’s score is equal or above 70 percent of the other
students in the class
5. Which best describes a normal distribution?
A. Positively skewed c. symmetric
B. Negatively skewed D. bimodal
6. What does a large standard deviation indicate?
A. Scores are not normally distributed
B. Scores are not widely spread, and the median is unreliable
measure of central tendency.
C. Scores are widely distributed where the mean may not be a reliable
measure of central tendency.
D. Scores are not widely distributed, and the mean is recommended
as more reliable measure of central tendency
7. In a normal distribution, approximately what percentage of scores is
expected to fall within three-standard deviation from the mean?
A. 34% C. 95%
B. 68% D. 99%
8. Which of the following is interpreted as the percentage of scores in a
reference group that falls below a particular raw score?
A. Standard scores C. Reference group
B. Percentile rank D. T-score
9. For the data illustrated in the scatter plot
below, what is the reasonable product-
moment correlation coefficient?
A. 1.0
B. -1.0
C. 0.90
D. -0.85
10. A Pearson test statistics yields a
correlation coefficient (r) of 0.90. If X
represents scores on vocabulary and Y,
the reading comprehension test scores, which of the following best
explains r=0.90?
A. The degree of association between X and Y is 81%.
B. The strength of relationship between vocabulary and reading
comprehensive is 90%.
C. There is almost perfect positive relationship between vocabulary
test scores and reading comprehension.

D. 81% of the variance observed in y can be attributed to the variance

observed in X.
O. Refer to these test data:
38 15 10 27 18 26 27 30 17 43
21 36 27 34 49 31 40 20 30 36
42 23 34 39 46 21 39 33 37 34
26 43 27 28 34 36 27 35 47 45
18 28 22 42 20 18 35 27 47 34
60 51 63 37 43 60 38 25 45 18
31 29 64 18 31 28 32 22 18 49
22 21 21 31 26 27 27 32 41 40
35 38 34 24 37 35 35 22 38 25
19 28 28 42 33 25 27 25 50 18
a. How do you find the mean if data are grouped? What is the most
appropriate class interval for this set of data? What other information
you need to generate and achieve the task of what is the mean?
b. How do we determine the median of the scores?
 In what class interval did the median fall?
 What is the cumulative frequency below the median class?
 What is the frequency of the median class?
 What is the lower limit of the median class?
 What is the median equal to?
P. Take hold of a specific calculator. Use the raw score formula in finding the
standard deviation. If you have Excel or whatever software, you can
directly work on the data.
1. What is the variance?
2. What is the standard deviation?
Q. Assuming that the scores are normally distributed, what is the range of
scores the would fall between:
1. ±1σ from the mean. Explain your answer.
2. ±2σ from the mean. How did you determine this ranger of scores?
3. ±3σ from the mean. Explain your answer.
4. Define the 25th percentile. How did you determine it?
5. Where is the 75th percentile? How did you locate it? How many fall on
this percentile?

6. How many got a percentile rank of 99? How did you determine it?
Explain.
R. With the help of your scientific calculator, Excel, or whatever, load a data
set - either hypothetical data or an actual data - you have accessed from
additional documents or any research studies. Analyse the data and enter
the values into the following table:
Mean
Median
Mode
Range
Variance
Standard Deviation
Skewness
Kurtosis
Select an appropriate type of graph (bar chart, histogram, pie graph,

box plot, and line) for this variable and have Excel or SPSS draw the
graph. Why did you select this particular type of graph?
Examine the values you have written in the table. Discuss the graph in
relation to the values in the table.
Educator’s Input
In many programs, topics on measures of central tendency, measures
of variability, and measure of correlation are taught generally by mathematics
teachers. This is because these topics are core content in Statistics and for
many, statistics is mathematics. Many personal encounters with students
made me think that instructional discourse on the subject of Measurement
and Assessment of Learning that involves statistics tends to be restricted to
performing mathematical procedures, such as calculating the mean, median,
mode, standard deviation, etc. My graduate students who are already
practitioners in their own field have shared their year-end reports of student
performance, including computations of the mean, median, mode, and
standard deviation. I am happy to hear that they did not encounter difficulties
in mechanical computations with scientific calculators and Microsoft Excel.
When I asked why they have to compute all the three measures of central

tendency for their student performance report, they could not give me a
convincing reason. I further asked what they did with the standard deviation
and how they utilized all those numerical values. I did not get logical
justification, while they were confident in providing the conceptual meaning of
mean, median, mode, variability measures; getting the “big ideas” that
underlie these statistical concepts and their functionalities in improvement of
learning appeared very much wanting. Whether I had been guilty, somehow,
of being too procedural in my own teaching of Statistics during my early years
of teaching, I have later discovered a teaching approach to deepen my
students’ understanding of measurement theories, concepts, and principles.
The strategy of “posing relevant scenario” or simply “scenario-posing” for
students to examine and explore can be effective. From the scenario, I can
generate prompts to invite my students to participate in the class discussion.
Most likely, students become interested because in presenting a scenario, I
create a story. I want to believe that a story-form presentation works across
age levels, not only for young children. In the early years, the scenarios I
presented were mostly hypothetical; my objective was primarily to dramatize
the main idea embodied in the statistical measure. In later years, these
scenarios became more real and effortless based on my actual experiences
as a teacher educator and a researcher.
In presenting the scenario, I always have in mind the “seed concepts”,
which are in coherence with my lesson objectives. I have been practicing this
approach for quite some time, and I think teachers using this strategy will be
able to deepen student knowledge by eliciting from them the nuances of the
information embodied in the story problem. These examples maybe worth
sharing:
1. The town I live has a population of 5,000 native people in 2017, and
then mean income per person is Php 30,000.00. Now, suppose Mr.
Manuel from a distant region and a millionaire moves to my town in
2018. Let us say that the income of Mr. Manuel was Php
120,000,000.00. So, with 5,001 people now living in my town, what is
the mean income of my town? Php53,989.00? Does this information
indicate that the 5000 natives in my town suddenly made Php 23,989
more in their income 2018?

(The message that is conveyed in this situation is the presence

of outlier or extreme values in the dataset that will make the
mean a less appropriate measure of average.)
2. Mary entered a class, seeing one seatmate, Jane, in the verge of
crying. She asked her what makes her cry. Jane answered “I got a
grade of 75 in physics.”
Mary saw her seatmate, Daniel who is in full smile and happy. She
asked him what makes him happy. Daniel answered, “I got a grade of
75 in Biology.”
Mary turned back to Jane and Told her, “you do not have a reason to
be sad. Look at Daniel. He got the same grade of 75 in Biology, and he
is happy. So, be happy like Daniel.”
Is Mary correct in her statement to Jane?
(This scenario can deepen the understanding of mean and standard
deviation and their functionalities, and eventually can serve as a takeoff point
in introducing z-scores as one standard score and its use in comparing test
performance.)
References
Cheusheva, Svetlana (September 4, 2020). Mean, median and mode in
Excel. Retrieved from https://www.ablebits.com/office-addins-
blog/2017/05/24/mean-median-mode-excel/
Frost, Jim (n.d.).Measures of Variability: Range, Interquartile Range,
Variance, and Standard Deviation. Retrieved from
https://statisticsbyjim.com/basics/variability-range-interquartile-
variance-standard-deviation/.
McLeod, Saul (2019). Introduction to the Normal Distribution (Bell Curve).
Retrieved from https://www.simplypsychology.org/normal-
distribution.html
Mean, Median, Mode: What They Are, How to Find Them. Retrieved from
https://www.statisticshowto.com/probability-and-statistics/statistics-
definitions/mean-median-mode/
Statisticsfun (September 20, 2009). How to calculate Standard Deviation,
Mean, Variance Statistics, Excel. [Video]. YouTube:
https://www.youtube.com/watch?v=efdRmGqCYBk.

Study.com (2020). Skewness in Statistics: Definition, Formula & Example.

[Video]: https://study.com/academy/lesson/skewness-in-statistics-
definition-formula-example.html
Surbhi S. (February 27, 2017). Differences Between Skewness and Kurtosis.
Retrieved from https://keydifferences.com/differences-between-
skewness-and-kurtosis.html.
The Organic Chemistry Tutor (May 29, 2018). Calculating The Standard
Deviation, Mean, Median, Mode, Range, & Variance Using Excel.
[Video]. YouTube: https://www.youtube.com/watch?v=k17_euuiTKw.
Lesson 3: Grading and Reporting of Test Results
Pre-discussion
Grading and reporting learners’ test performance is a complex task. It
requires specific knowledge, skills, and experience. To perform successfully
the assigning of grades and reporting of level of performance or achievement,
pre-service teachers should be able to understand its purpose, identify
different methods of scoring and grading test performance, differentiate the
various types of test scores, and interpret test results based on norms and
pre-set standards.
What to Expect?
1. define what is grading;

2. discuss the different methods in scoring tests and performance tasks;

different types of test scores; guidelines on grading tests and
performance; test scores; and how to communicate test scores;
3. prepare scoring rubrics for performance tasks;
4. discuss the assessment system in the Department of Education as
contained in DO No. 8, s. 2015; and
5. compute the grades of learners based on DepEd guidelines.
Grading and Reporting

Grading and reporting are fundamental elements in the teaching-
learning process. Assignment of grades represents the teacher’s assessment
of the learners’ performance on the tests and on the desired learning
outcomes as a whole. As such, it is important that the bases and criteria for
grading and reporting test results are clearly established and articulated from
the very start of the course. Besides, grades are symbolic representations that
summarize the quality of learner’s work and level of achievement. Teachers
should ensure that grading and reporting of learners’ test results are
meaningful, fair, and accurate.
To learn how to assign grades and report learners’ test performance in
a meaningful and effective manner, it is important that you review your prior
knowledge and experience, as well as the standards or policies used by your
institution in grading and reporting learners’ performance in the test and the
course as a whole. You may also need to read books and other references on
the topics to validate you’re a prior knowledge and to enhance further your
knowledge and skills.
Definition of Grading
In his paper, Magno discussed (2010) that effective and efficient way of
recording and reporting evaluation results is very important and useful to
persons concerned in the school setting. Hence, it is very important that
students’ progress is recorded and reported to them, their parents and
teachers, school administrators, and counselors as well because this
information shall be used to guide and motivate students to learn, establish
cooperation, and collaboration between the home and the school. It is also

used in certifying the students’ qualifications for higher educational levels and
for employment. In the educational setting, grades are used to record and
report students’ progress.
Grades are essential in education such that students’ learning can be
assessed, quantified, and communicated. Every teacher needs to assign
grades which are based on assessment tools such as tests, quizzes, projects,
and so on. Through these grades, achievement of learning goals can be
communicated with the students and parents, teachers, administrators, and
counselors. However, it should be remembered that grades are just a part of
communicating student achievement; therefore, it must be used with
additional feedback methods.
Grading implies (a) combining several assessments, (b) translating the
result into some type of scale that has evaluative meaning, and (c) reporting
the result in a formal way. From this definition, we can clearly say that grading
is more than quantitative values as many may see it; rather, it is a process.
Grades are frequently misunderstood as scores. However, it must be clarified
that scores make up the grades. Grades are the ones written in the report
cards of students which is a compilation of students’ progress and
achievement all throughout a quarter, a trimester, a semester or a school
year.
Grades are symbols used to convey the overall performance or
achievement of a student and they are frequently used for summative
assessments of students. Take for instance two long exams, five quizzes, and
ten homework assignments as requirements for a quarter in a particular
subject area. To arrive at grades, a teacher must be able to combine scores
from the different sets of requirements and compute or translate them
according to the assigned weights or percentages. Then, he or she should
also be able to design effective ways on how he/she can communicate it with
students, parents, administrators and others who are concerned. Another
term not commonly used to refer to the process is marking. Figure 1 shows a
graphical interpretation summarizing the grading process.

What are the purposes of grading and reporting learners’ test

performance?
There are various reasons why we assign grades and report learners’
test performance. Grades are alphabetical or numerical symbols/marks that
indicate the degree to which learners are able to achieve the intended
learning outcomes. Grades do not exist in a vacuum but are part of the
instructional process and serve as a feedback loop between the teacher and
learners. They are one of the ways to communicate the level of learning of the
learners in specific course content. They give feedback on what specific
topic/s learners have mastered and what they need to focus more when they
review for summative assessment or final exams. In a way, grades serve as a
motivator for learners to study and do better in the next tests to maintain or
improve their final grade.
Grades also provide the parents, who have the greatest stake in
learners’ education, precise information about their children’s achievements.
They give teachers the bases for improving their teaching and learning
practices and for identifying learners who need further educational
intervention. They are also useful to school administrators who want to
evaluate the effectiveness of the instructional programs in developing the
needed skills and competencies of the learners.
Magno (2020) emphasized that the purposes of grading can be
categorized into four major parts in the educational setting.

Feedback. Feedback plays an important role in the field of education such

that it provides information about the students’ progress or lack. Feedback
can be addressed to three distinct groups concerned in the teaching and
learning process: parents, students, and teachers.
Feedback to Parents. Grades especially conveyed through report cards

provide a critical feedback to parents about their children’s progress in
school. Aside from grades in the report cards however, feedbacks can
also be obtained from standardized tests, teachers’ comments. Grades
also help parents to identify the strengths and weaknesses of their child.
Depending on the format of report cards, parents may also receive
feedbacks about their children’s behavior, conduct, social skills and other
variables that might be included in the report card. On a general point of
view, grades basically tell parents whether their child was able to perform
satisfactorily. However, parents are not fully aware about the several and
separate assessments which students have taken that comprised their
grades. Some of these assessments can be seen by parents but not all.
Therefore, grades of students, communicated formally to parents can
somehow let parents have an assurance that they are seeing the overall
summary of their children’s performance in school.
Feedback to Students. Grades are one way of providing feedbacks to
students such that it is through grades that students can recognize their
strengths and weaknesses. Upon knowing these strengths and
weaknesses, students can be able to further develop their competencies
and improve their deficiencies. Grades also help students to keep track of
their progress and identify changes in their performance. This feedback is
directly proportional with the age and year level with the students such
that grades are given more importance and meaning by a high school
student as compared to a grade one student, however, motivation plays a
role in grades. Such that Grade 1 students are motivated to get high
grades because of external rewards and high school students are also
motivated internally to improve one’s competencies and performance.
Feedback to Teachers. Grades serve as relevant information to teachers. It
is through grades of students that they can somehow (a) assess whether

students were able to acquire the competencies they are supposed to

have after instruction; (b) assess whether their instruction plan and
implementation was effective for the students; (c) reflect about their
teaching strategies and methods; (d) reflect about possible positive and
negative factors that might have affected the grades of students before,
during and after instruction; and (e) evaluate whether the program was
indeed effective or not. Given these beneficial purposes of grades to
teachers, we can really say that teaching and learning is a two way
interrelated process, such that it is not only the students who learn from
the teacher, but the teacher also learns from the students. Through
grades of students, a teacher can be able to undergo self-assessment
and self- reflection in order to improve oneself and be able to recognize
relative effectiveness of varying instructional approaches across different
student groups being observed and be flexible and effective across
different situations.
Administrative Purposes
At their end, school administrators can use the grades of students for a
more general purpose as compared to teachers, such that they can utilize
grades to evaluate programs, identify and assess areas that needs to be
improved and whether or not curriculum goals and objectives of the school,
has been attained by the students through their institution.
Promotion and Retention. Grades can serve as one factor in determining if
a student will be promoted to the next level or not. Through the grades of
students, skills, and competencies required of him to have for a certain
level can be assumed whether or not he was able to achieve the
curriculum goals and objectives of the school and/ or the state. In some
schools, the grade of students is a factor taken into consideration for his/
her eligibility in joining extracurricular activities (performing, theater arts,
varsity, cheering squads… etc.). Grades are also used to qualify a student
to enter high school or college in some cases. Other policies may arise
depending on the schools’ internal regulations. At times, failing marks
may prohibit a student from being a part of the varsity team, running for
officer, joining school organizations, and some privileges that students

with passing grade get. In some colleges and universities, students who
get passing grades are given priority in enrolling for the succeeding term,
as compared to students who get failing grades.
Placement of Students and Awards. Through grades of students,
placement can be done. Grades are factors to be considered in placing
students according to their competencies and deficiencies. Through
which, teaching can be more focused in terms of developing the strengths
and improving the weaknesses of students. For example, students who
consistently get high, average and failing grades are placed in one section
wherein teachers can be able to focus more and emphasize students’
needs and demands to ensure a more productive teaching learning
process. Another example which is more domain-specific would be
grouping students having same competency on a certain subject together.
Through this strategy, students who have high ability in Science can
further improve their knowledge and skills by receiving more complex and
advanced topics and activities at a faster pace, and students having low
ability in Science can receive simpler and more specific topics at a slower
pace. Aside from placement of students, grades are frequently use as
basis for academic awards. Many or almost all schools, universities and
colleges have honor rolls, and dean’s list, to recognize student
achievement and performance. Grades also determine graduation awards
for the overall achievement or excellence a student has garnered
throughout his/her education in a single subject or for the whole program
he has taken.
Program Evaluation and Improvement. Through the grades of students
taking a certain program, program effectiveness can be somehow
evaluated. Grades of students can be a factor used in determining
whether the program was effective or not. Through the evaluation
process, some factors that might have affected the program’s
effectiveness can be identified and minimized to improve the program
further for future implementations.
Admission and Selection. External organizations from the school also use
grades as reference for admission. When students transfer from one
school to another, their grades play crucial role for their admission. Most

colleges and universities also use grades of students in their senior year
in high school together with the grade they shall acquire for the entrance
exam. However, grades from academic records and high stakes tests are
not the sole basis for admission; some colleges and universities also
require recommendations from the school, teachers and/or counselors
about students’ behavior and conduct. The use of grades is not limited to
the educational context. It is also used in employment, for job selection
purposes, and at times even in insurance companies that use grades as
basis for giving discounts in insurance rates.
What are the different methods in scoring tests or performance tasks?

There are various ways to score and grading results in multiple-choice
tests. Traditionally, the two most commonly-used methods are number right
scoring (NR) and negative marking (NM).
Number Right Scoring (NR). It entails assigning positive values only to

correct answer while giving a score of zero to incorrect answer. The test
score is the sum of the scores for correct responses. One major concern
with this scoring method is that learners may get the correct answer by
guessing; thus, affecting the test reliability and validity.
Example: Solve for 3(X + 8) – (X – 12) = -28.
a. X = 32 c. X = - 32
b. X = 8 d. X = -8
For the above item, the correct answer is c (X = -32) and this will merit
a score. Responses other than c will be given zero (0) point.
Negative Marking (NM). It entails assigning positive values to correct

answers while punishing the learners for incorrect response (i.e., right-
minus-wrong correcting method). In this model, a fraction of the number of
wrong answers is subtracted from the number of correct answer. Other
models for this type of scoring method included (1) giving a positive score
to correct answer while assigning no mark for omitted items and (2)
rewarding learners for not guessing by awarding points rather than

penalizing learners for incorrect answer. The recommended penalty for an

incorrect answer is 1/(n – 1), where n stands for the number of choices.
For the above item, scoring will be as follows:
 Learners who chose letter d will be given a score, those who left the item
unanswered will be given a zero (0) point, and those who chose a, b, c, or
e will get negative score. The total score is then computed by adding the
scores (e.g., 1, 0, 0, -0.25) across all items. For example, in a 10-item test,
if a learner got six items correct, two item wrong, and left two items wrong,
and left two items unanswered, then the total score is 5.5 (see illustration).
Item # Score
1 1
2 0
3 -0.25
4 1
5 1
6 0
7 -0.25
8 1
9 1
10 1
Total 6 + (-0.50) = 5.5
Both NR and NM methods of scoring multiple-choice test are prone to

guessing which affect test validity and reliability. As a result, nonconventional
scoring methods were introduce, which include: (1) Partial Credit Scoring
Methods, (2) Multiple Answer Scoring Method, (3) Retrospective Correcting
for guessing, and (4) Standard Setting Scoring Method.
Partial Credit Scoring Method. It attempts to determine a learner’s degree of

level of knowledge with respect to each response option given. This
method of scoring takes into account partial knowledge mastery of
learners. It acknowledges that, while learners cannot always recognize
the correct response, they can discern that some response options are
clearly incorrect. There are three formats of partial credit scoring method:
Liberal Multiple-Choice Test – It allows learners to select more than one
answer to a question if they uncertain which option or alternative is
correct.

Elimination Testing (ET) – It instruct learners to cross out all alternatives

they considered to be incorrect.
Confidence Weighting (CW) – It asks learners to indicate what they believe
is the correct answer and how confident they are about their choice.
For this type of scoring, an item can be assigned different scores,

depending on the learners’ response. This can be illustrated by one item
below:
Example:
 Linda obtained a score of 35% in her Reading Test. What does her score
mean? Justify your answer.
a. Linda got 55% of the test items correct.
b. Linda was able to answer correctly more than half of the items.
c. Linda obtained a raw score lower than those obtained by 55% of his
peers.
d. If the test has 60 items, Linda would probably have 33 correct answers.
For this item, each response option has an assigned score with its
corresponding rationale. An example of how the item can be scored is shown below:
Option Points Rationale
A 3 Since the core was presented in percent, this is the correct
interpretation.
B 1 While the interpretation may be correct, it does not give a
more specific meaning to the score. Besides, the same
interpretation can also be applicable to scores higher than
51%.
C 0 This interpretation is wrong as this interpretation is applicable
to a score of 55th percentile rank.
D 2 This interpretation gives an example how the score was
computed
Multiple Answers Scoring Method. It allows learners to have multiple

answers for each item. In this method, learners are instructed that each
item has at least one correct answer or how many answers are correct.
Items can be scored as solved only if all the correct response options are
marked but none of the incorrect others. Incorrect options that are marked
can lead to negative scores. Thayn (2001) found that multiple answer and
single answer items have the same discrimination power, item difficulty,

and reliability indices. However, multiple answers method is more difficult

to solve, has lower discrimination power, and takes more time to answer.
Consider the previous example, what if the above item was stated as:
Linda obtained a score of 55% in her Reading Test. What is/are the
possible interpretation/s of her score?
a. Linda got 55% of the test items correct.
b. Linda was able to answer correctly more than half of the items.
c. Linda obtained a raw score lower than those obtained by 55% of his
peers.
d. If the test has 60 items, Linda would probably have 33 correct
answers.
The above item is a type of “multiple-answer” test and would require a
different scoring method. Learners who choose option a, b, and d will get 1
point each. As such, if a learner chooses options and a and d, he/she will
obtained a score of 2 for that item; he/she choose options a, b, and d, he/she
will get 4 points. Learners who choose the wrong option will be penalized
using the negative marking method.
Retrospective Correcting for Guessing. It considers omitted or no-answer
items as incorrect, forcing learners to give an answer for every item even
if they do not know the correct answer. The correction for guessing is
implemented later or retroactively. This can be done through comparing
learners’ answer in multiple-choice items with their answer on other test
formats, such as short-answer test.
Standard-setting. It entails using standards when scoring multiple-choice
items, particularly standards set through norm-referenced or criterion-
referenced assessment. Standards based on norm-referenced
assessment area derived from the test performance of a certain group of
learners, while standards from criterion-referenced assessment are based
on present standards specified from the very start by the teacher or
school in general.
For example, for a final examination in algebra, the Mathematics
Department can set the passing score (e.g., 75 percentile rank) based on the
norms derived from the scores of learners for the past three years. To do this,
the department will need to collect the previous scores of learners on the

same or equivalent final exams and apply the formula for standard scores to
compute for the percentile ranks for each range of scores. On the other hand,
passing grades/scores are usually set by the department or the school based
on their standards (e.g., A (90-100 percent), B (80-89 percent), C (70-79
percent), or F (0-69 percent).
Marking or scoring constructed-type of tests, such as essay and
performance tests, require standardized scoring schemes so that scores are
reliable and have the same valid meaning for all learners. There are four
types of rating scales for the assessment of writing, which can also be applied
to other authentic or performance-type assessment. These four types of
scoring are (1) Holistic, (2) Analytic, (3) Primary Trait, and (4) Multiple Trait
scoring.
Holistic Scoring. It involves giving a single, overall assessment score for an
essay, writing composition, or other performance-type of assessment as a
whole. Although the scoring rubric for holistic scoring lays out specific
criteria for evaluating a task, raters do not assign a score for each
criterion. Instead, as they read writing task or observe a performance
task, they balance strengths and weaknesses among the various criteria
to arrive at an overall assessment. Holistic scoring is considered efficient
in terms of time and cost. It also does not penalized poor performance
based on only one aspect (e.g., content, delivery, organization,
vocabulary, or coherence for oral presentation). However, it is said that
holistic scoring does not provide sufficient diagnostic information about
the students’ ability as it does not identify the areas for improvement and
is difficult to interpret as it does not detail the basis for evaluation.
The following is an example of a rubric for an oral presentation:
Rating/Grade Characteristics
A Is very organized. Has a clear opening statement that catches
(Exemplary) audience’s interest. Content of report is comprehensive and
demonstrates substance and depth. Delivery is very clear and
understandable. Uses slides/multimedia equipment effortlessly
to enhance presentation.
B Is mostly organized. Has opening statement relevant to topics.
(Satisfactory) Has appropriate pace and without distracting mannerisms.
Looks at slides to keep on track.
C Has opening statement relevant to topic and but does not give
(Emerging) outline of speech; is somewhat disorganized. Lacks content and

depth in the discussion of the topic. Delivery is fast and not

clear; some items not covered well. Relies heavily in slides and
notes and makes little eye contact.
D Has no opening statement regarding the focus of the
(Unacceptable) presentation. Does not give adequate coverage of topic. Is often
hard to understand, with voice that is too soft or too loud and
pace that is too quick or too slow. Just reads slides; slides too
much text.
Analytic Scoring. It involves assessing each aspect of a performance task

(e.g., essay writing, oral presentation, class debate, and research paper) and
assigning a score for each criterion. Sometimes, an overall score is given by
averaging the scores in all criteria. One advantage of analytic scoring is its
reliability. It also provides information that can be used as diagnostic as it
presents learners’ strengths and weaknesses and in what area’s and
eventually as basis for remedial instruction. However, it is more time
consuming and therefore, expensive. It is also prone to halo effect, wherein
scores in one scale may influence the ratings of others. It is also difficult to
create.
Below is an example of an analytic scoring for a research paper. In this
scoring, learners’ work is to be assessed based on different parts of the
research paper, namely, introduction, method, results, discussion, conclusion,
and recommendations, as well as on some criteria (e.g., spelling, grammar,
documentation, and format).
Rubric for a Final Research Paper
Criteria/Indicators Expert Proficient Apprentice Novice
4 3 2 1
1. Introduction At least Any two of Any one of None of
a. Clearly identifies and A to C the given the given the given
discusses research are indicators indicators indicators
focus/purpose satisfied are is satisfied is satisfied
b. Research focus is clearly satisfied
grounded in previous
research/theoretically
relevant literature.
c. Significance of study is
clearly identified (and how
it adds to previous
research).
d. Others, please specify
2. Method At least Any two of Any one of None of
Provides accurate and A to C the given the given the given
thorough information of the are indicators indicators indicators

following: satisfied are is satisfied is satisfied

a. Research method, design, satisfied
and context
b. Data source, collection
procedure, and tools
c. Data analysis
d. Other, please specify
3. Results At least Any two of Any one of None of
a. Results are clearly A to C the given the given the given
explained in a are indicators indicators indicators
comprehensive level and satisfied are is satisfied is satisfied
are well-organized. satisfied
b. Tables/figures clearly and
concisely convey the data.
c. Statistical analyses are
appropriate tests and are
accurately interpreted.
d. Others, please specify.
4. Conclusions, Discussion, At least Any two of Any one of None of
and Recommendations A to C the given the given the given
a. Interpretations/ analysis of are indicators indicators indicators
result are thoughtful and satisfied are is satisfied is satisfied
insightful; are clearly satisfied
informed by the study’s
results; and thoroughly
address how they
supported, refuted, and/or
informed the
hypotheses/proposition.
b. Discussions on how the
study relates to and/or
enhances the present
scholarship in this area are
adequate.
c. Suggestions for further
research in this area are
insightful and thoughtful.
d. Others, please specify
5. Documentation and Quality At least Any two of Any one of None of
of Sources A to C the given the given the given
a. Cites all data obtained are indicators indicators indicators
from other sources satisfied are is satisfied is satisfied
b. APA style is accurately satisfied
used in both text and
references.
c. Sources are all scholarly
and clearly relate to
research.
d. Others please specify.
6. Spelling and Grammar At least Any two of Any one of None of
a. No error in spelling. A to C the given the given the given
b. No error in grammar are indicators indicators indicators
c. No error in the use of satisfied are is satisfied is satisfied
punctuation marks. satisfied


7. Manuscript Format At least Any two of Any one of None of
a. Title page has proper APA A to C the given the given the given
formatting. are indicators indicators indicators
b. Used correct headings and satisfied are is satisfied is satisfied
subheadings consistently. satisfied
c. Proper margins were
observed.
Final Grade
Primary Trait Scoring. It focuses on only one aspect or criterion of a task,

and a learner’s performance is evaluated on only trait. This scoring
system defines a primary trait in the test that will then be scored. For
example, if a teacher in a political science class asks his students to write
an essay on the advantages and disadvantages of Martial Law (i.e., the
writing task), the basic question addressed in scoring is, “Did the writer
successfully accomplish the purpose of this task?” With this focus,
teacher would ignore errors in conventions of written language but instead
focus on overall rhetorical effectiveness. One disadvantage of this scoring
scheme is that it is often difficult to focus exclusively on one trait, such
that other traits may be included when scoring. Thus, it is important that a
very detailed scoring guide is used for each specific task.
Multiple-Trait Scoring. It requires that an essay test or performance task is
scored on more aspect, with scoring criteria in place so that they are
consistent with the prompt. Multiple-trait scoring is task-specific, and the
features to be scored vary from to task; thus, requiring separate scores for
different criteria. Multiple-trait scoring is similar to analytic scoring
because of its focus on several categories or criteria. However, while
analytic scoring evaluates more traditional and generic dimensions of
language production, multiple-trait scoring focuses on specific features or
performance required to fulfill the given task or tasks. For example,
scoring criteria for writing performance may include abilities to present
argument clearly, to organized one’s and to present accurate language
usage through grammar, punctuation, and spelling.
What are the different types of test scores?

Grading methods communicate the teachers’ evaluation appraisal of

learners’ level of achievement or performance in a test or task. In grading,
teachers convert different types of descriptive information and various
measures of learners’ performance into grades or marks that will provide
feedback to learners, parents, and other stakeholders about learners’
achievement. Test scores can take the form of any of the following: (1) raw
scores, (2) percentage scores, and (3) derived scores. Under derived scores
are grades that are based on criterion-referenced and norm-referenced
grading system.
1. Raw score is simply the number of items answered correctly on a test. A
raw score provides an indication of the variability in the performance of
students in the class. However, a raw score has no meaning unless you
know what the test is measuring and how many items it contains. A raw
score also does not mean much because it cannot be compared with a
standard or with the performance of another learner or of the class as a
whole. For example, a raw score of 95 would look impressive, but only if
there are 100 items in the test. However, if the test contains 500 items,
then the raw score of 95 is not good at all. A test that only gives raw score
but not the total number of items does not measure and communicate the
learner’s performance or achievement. Raw scores may be useful if
everyone knows the test and what it covers, how many possible right
answers there are, and how learners typically do in the test.
2. Percentage Score. This refers to the percent of items answered correctly
in a test. The number of items answered correctly is typically converted to
percent based on the total possible score. The percentage score is
interpreted as the percent of content, skills, or knowledge that the learner
has a solid grasp of. Just like raw score, percentage score has limitation
because there is no way comparing the percentage correct obtained in a
test with the percentage correct in another test with a different difficulty
level.
Percentage score is most appropriate to use in teacher-made test
or criterion-referenced test. Percentage score is appropriate to use for
teacher-made test that is administered commonly to a class or to students
taking the same course with the same contents or syllabus. In this way,

the students’ test performance can be compared among each other in the
class or with their peers in another section. In the same manner,
percentage score is suitable to use in subjects wherein a standard has
been set. For example, if an algebra subject sets a passing score of 60%
in a test (e.g., for example it is considered as average), the teachers and
learners would know if a learner has met the desired level of competencies
through his/her percentage score.
Aside from the above test scores, the decision on what type of test
scores to use is based on whether the learners’ test performance is to be
compared with a standard or criterion or with the scores of other learners
or peers. This decision will entail the choice between the two major types
of grading system: 1) criterion-referenced; and 2) norm-referenced grading
system.
3. Criterion-Referenced Grading System. This is a grading system wherein
learners’ test scores or achievement levels are based on their
performance in specified learning goals and outcomes and performance
standards. Criterion-referenced grades provided a measure of how well
the learners have achieved the preset standards, regardless of how
everyone else does. It is therefore important that the desired outcomes
and the standards that determine proficiency and success are clear to the
learners at the very start. These should be indicated in the course
syllabus. Criterion-referenced grading is premised on the assumption that
learners’ performance is independent of the performance of the other
learners in their group/class.
The following are some of the types of criterion-referenced scores
or grades:
a. Pass or Fail Grade. This typed of score is most appropriate if the test
or assessment is primarily or entirely to make a pass or fail decision. In
this type of scoring, a standard or cut-off score is preset, and a learner
is given a score of Pass if he or she surpassed the expected level of
performance or the cut-off score. Pass or Fail scoring is most
appropriate for comprehensive or licensure exams because there is no
limit to the number of examinees who can pass or fail. Each individual

examinee’s performance is compared to an absolute standard and not

to the performance of others.
Pass or fail grading has the following advantages: (1) it takes
pressure off the learners in getting a high letter or numerical grade,
allowing them to relax while still getting the needed education; (2) it
gives learners a clear cut idea of their strengths and weaknesses; and
(3) it allows learners to focus on true understanding or learning of the
course content rather than on true understanding or learning of the
course content rather than on specific details that will help them
receive a high letter or numerical score. However, this type of grading
also eliminated competitiveness because learners no longer find the
urgency or the need to work hard to get a higher grade, does not
provide accurate representation of performance level and knowledge of
the learners, and is not possible to convert to exact score.
b. Letter Grade. This is one of the most commonly used grading
systems. Letter grades are usually composed of five-level grading
scale labeled from A to E or F, with A representing the highest level of
achievement or performance, and E or F – the lowest grade –
representing a Failing grade. These are often used for all forms of
learners’ work, such as quizzes, essays, projects, and assignments.
While letter grades look simple and easy to understand, the true
meaning of letters is not always clear to learners, parents, or other
stakeholders. The teachers’ rating and the stakeholders’ interpretation of the
grade are often different. As such, it is important that descriptors for each
grade are included in the reporting sheet to ensure accurate interpretation of
the letter grades. An example of the descriptors for letter grades is presented
below.
Letter Grades Interpretation
A Excellent
B Good
C Satisfactory
D Poor
E Unacceptable
The above evaluative descriptors indicate the teachers’ criterion-

referenced judgment of the learners’ achievement or performance

level. However, it would be best that these descriptors are paired with
specific performance indicators that identify the qualitative differences
between grade categories.
Another disadvantage of letter grades is that the cut-offs
between grade categories are always arbitrary and difficult to justify.
For example, if a score of C ranges from 76 to 85, learners who get a
grade of 76 in a writing test and those who receive a grade of 85 will
both get the same letter grades of C despite the nine-point difference.
Now, if the next range of grades is 86 to 96, then the one who gets an
86 receives a grade of B although it is just one score higher than 85,
which receives a grade of C. Furthermore, letter grades lack the
richness of more detailed grading methods.
c. Plus (+) and Minus (-) Letter Grades. This grading provides a more
detailed descriptions of the level of learners’ achievement or task/test
performance by dividing each grade category into three levels, such
that a grade of A can be assigned as A+, A and A-; B as B+ B and B-,
and so on. Plus (+) and minus (-) grades provide a finer discrimination
between achievement or performance level. They also increase the
accuracy of grades as a reflection learners’ performance; enhance
student motivation (i.e., to get a high A rather that an A-); and
discriminate among performance in a very similar pool of learners, such as
those in advance course or star sections. However, +/- grading system is
viewed as unfair, particularly for learners in the highest category; creates for
stress for learners; and is more difficult for teachers as they need to deal with
more grade categories when grading learners. Examples of the descriptors for
plus (+) and minus (-) letter grades are presented in the next matrix:
(+)/(-) Letter Interpretation
Grades
A+ Excellent
A Superior
A- Very Good
B+ Good
B Very Satisfactory
B- High Average
C+ Average
C Fair
C- Pass
D Conditional

E/F Failed
d. Categories Grades. This system of grading is generally more descriptive

that letter grades, especially if coupled with verbal labels. Verbal labels
eliminate the need for a key or legend to explain what each grade category
means. Examples of categorical grades are herein shown:
Exceeding Meeting Approaching Emerging Not
standards Standards Standards Exceeding
Standards
Advanced Intermediate Basic Novice Below Basic
Exemplary Accomplished Developing Beginning Inadequate
Expert Proficient Competent Apprentice Novice
Master Distinguished Proficient Intermediate Novice
Categorical grading methods have the same drawbacks as letter

grades. Like letter grades, the categorical grades provide cut-offs
between levels that are often arbitrary, lack the richness of more
detailed reporting methods, and fail to provide feedback or information
that can be used to diagnose learners’ weaknesses and refer for
remediation.
4. Norm-Referenced Grading System. In this method of grading,
learners’ test scores are compared with those of their peers. Norm-
referenced grading involves rank ordering learners and expressing a
learner’s score in relation to the achievement of the rest of the group
(i.e., class or grade level, school, etc.). the peer group usually serves
as the normative group (e.g., class, age group, year level). Unlike the
criterion-referenced scoring, norm-referenced scoring does not well
what the learners actually achieved, but it only indicates the learners’
achievement in relation to their peers’ performance. Norm-referenced
grading allows teachers to: 1) compare learners’ test performance with
that of other learners; 2) compare learners’ performance in one test
(subtest) with another test (subtest); and 3) compare learners’
performance in one form of the test with another form of the test
administered at an earlier date.
There are different types of norm-referenced scores:

a. Developmental Score. This is the score that has been transformed

from raw scores and reflect the average performance at age and grade
levels. There are two kinds of developmental scores: 1) grade-
equivalent; and 2) age-equivalent scores.
i. Grade-Equivalent Score is described as both a growth score
and status score. The grade equivalent of a given raw score on
any test indicates the grade level at which the typical learner
earns this raw score. It describes test performance of a learner
in terms of a grade level and the months since the beginning of
the school year. A decimal point is used between the grade and
month in grade equivalents. For example, a score of 7.5 means
that the learner did as well as a Grade 7 taking the test at the
end of the fifth month of the school year.
ii. Age-Equivalent Score indicate the age level that us typical to a
learner to obtain such raw score. It reflects a learner’s
performance in term of the chronological age as compared to
those in the norm group. Age-equivalent scores are written with
a hyphen between years and months, for example, a learner’s
score of 11-5 means that his age equivalent is 11 years and 5
months old, indicating a test performance that is similar to that of
11½ year olds in the norm group.
b. Percentile Rank. This indicates the percentage of scores that fall at or
below a given score. Percentile ranks range from 1 to 99. For example,
if a learner obtained a score of 75 th percentile rank in a standardized
achievement test, it means that the learner was able to get a higher
score than 75% of the learners or peers in the norm group. Percentile
ranks are not equal interval data, with differences in percentile ranks at
the extreme or end range larger that they are in the middle range. For
example, the differences between 90 and 95 percentile ranks and
between 5 and 10 percentile ranks are larger than the differences
between 50 and percentile ranks.
c. Stanine Score. This system express test result in nine equal steps,
which range from one (lowest) to nine (highest). A stanine score of 5 is

interpreted as “average” stanine. Percentile tanks are grouped into

stanines, with the following verbal interpretations:
Description Stanine Percentile Rank

Very high 9 96 and above
Above Average 8 90-95
7 77-89
Average 6 60-76
5 40-59
4 23-39
Below Average 3 11-22
2 4-10
Very Low 1 3 and below
d. Standard Score. They are raw scores that are converted into a
common scale of measurement that provides meaningful description of
the individual scores within the distribution. A standard score describes
the difference of the raw score from a sample mean, expressed in
standard deviations. Two most-commonly used standard scores are (1)
z-score and (2) T-score.
i. Z-score is one of a standard score. Z-score have a mean of 0 and
a standard deviation of 1. It is computed using the following
formula.
Standard scores are useful when you want to compare learners’ test
performance across two distributions. For example:

Class A Class B
Standard Deviation 1 5
Mean Score 85 90
Score of Student 1 90 (Luis) 95 (Michael)
While the difference between raw scores of Luis and Michael from
the mean is the same (i.e., 5). Michael’s standard score is lower
than Luis’ standard scoring (z of 1 vs. z of 5). This is because the
variability in scores in Michael’s class is higher than that in Luis’

class. As such, it is appropriate to convert raw scores mean

different things in different situation or for different learners.
A z-score can either be positive or negative. The (+) and (-) signs
do not indicate the magnitude of z-score; rather, they indicate the
direction of raw scores from the mean. A positive (+) z-score
means that the score is higher than the group mean, while a
negative (-) z-score indicates that the raw score is lower than the
group mean.
ii. T-score is another types of standard score, where in the mean is
equal to 50, and the standard deviation is equal to 10. It is linear
transformation of z-scores, which have mean 0 and standard
deviation. It is computed from a z-score with the following formula:
T = 5 + 10Z
A T-score of 50 is considered “average”, with T-score ranging from
40 to 60 as within the normal range. T-score of 30 and below and
T-scores of 70 and above are interpreted as low and high test
performance, respectively.
The following figure presents the relationships among the various
standard scores and percentile rank in a normal distribution.
Figure 1. Relationship among Various Standard Scores and

Percentile Tank in a Normal Distribution
What are the general guidelines in grading tests or performance tasks?

Utmost care should e observed to unsure that grading practices are

equitable, fair, and meaningful to learners and other stakeholders. It is
therefore important that when constructing a test or performance task, the
methods and criteria for grading learners’ responses or answers should be set
and specified. In doing so, it is essential to consider the purpose of the
assessment, the ability levels of the entire group being tested, and the nature
of the tasks to be performed. The following are the general guidelines in
grading tests or performance tasks:
1. Stick to the purpose of the assessment. Before coming up with an
assessment, it is first important to determine the purpose of the test. Will
the assessment be used for diagnostic purposes? Will it be a formative
assessment, or is a summative assessment? Diagnostic and formative
assessments are generally not graded. Diagnostic assessments are
primarily used to gather feedback about the learners’ prior knowledge or
misconception before the start of a learning activity, while results from
formative assessments are used to determine what leaners need to
improve on or what topics or course contents need to be addressed and
given emphasis by the teacher. Formative assessment results are also to
be used by learners to reflect on and monitor their own learning and,
therefore, should not be graded. On the other hand, an assessment that is
used as a summative evaluation should be assigned a grade as it is aimed
to determine how well the learners were able to achieve the desired
learning outcomes.
2. Be guided by the desired learning outcomes. The learners should be
informed early on what are expected of them insofar as learning outcomes
are concerned, as well as how they will be assessed and graded in the
test. Such information can be disseminated through the course syllabus or
during course introduction. Accordingly, the tests or performance tasks
that will be conducted in the class should only include and focus on the
intended learning outcomes and the specified. Should the test include
items that have not been discussed or are not part of the course syllabus,
these items should not be included in the computation of the final test
score.

3. Develop grading criteria. Grading criteria to be used in traditional tests,

and performance tasks should be made clear to the students. Similarly,
learners should also be informed of the weight of each criterion. Grading
criteria and weights should be applied fairly and consistently. A holistic or
analytic rubric can be used to map out the grading criteria.
Developing grading criteria may be tedious. However, having clear
criteria can save time in the grading process, makes the grading process
more consistent and fair, communicates expectations to learners, and
helps learners understand how their work is graded.
4. Inform learners what scoring methods are to be used. Learners should
be made aware before that start of testing, whether their test responses
are to be scored based on the number right, negative marking, or through
non-conventional scoring methods. As such, the learners will be guided on
how to mark their responses during the test. Such instruction should be
followed and applied consistently to every leaner in the same class or
other learners in the same course but different classes.
5. Decide on what type of test scores to use. As discussed earlier, there
are different ways by which students learning can be measured and
presented. Performance in a particular test can be measured and reported
through raw scores, percentage scores, criterion-referenced scores, or
norm-referenced scores. It is important that different types of grading
scheme be used for different tests, assignments, or performance tasks.
Learners should also be informed at the start of what grading system is to
be used for a particular test or task.
What are the general guidelines in grading essay tests?

Essays require more time to grade that the other types of traditional
tests. Grading essay tests can also be influenced by extraneous factors, such
as learners’ handwriting legibility and raters’ biases. It is therefore important
that you devise essay question prompts and grading scheme procedures that
will minimize the threats to validity and reliability.
Scoring essay responses can be made more rigorous by developing a
scoring scheme. The following are the general guidelines in scoring essay
tests:

1. Identify the criteria for rating the essay. The criteria or standards for
evaluating the essay should be predetermined. Some of the criteria that
can be used include content, organization/format, grammar proficiency,
development and support, focus and details, etc. It is important that the
specific standards and criteria included are relevant to the type of
performance tasks given.
2. Determine the type of rubric to use. There are two basic types of rubric:
holistic or analytic scoring system. Holistic rubrics require evaluating the
essay and taking into consideration all the criteria. Only a single score is
given based on the overall judgment of the learners’ writing composition.
Holistic rubric is viewed to be more convenient for the teachers as it
requires less area or aspect of writing to evaluate. However, it does not
provide specific feedback on what course topic/content or criteria that the
students are week at and need to improve on. On the other hand, analytic
scoring system requires that the essay is evaluated based on each of the
criteria. It provides useful feedback on learner’s strengths and weaknesses
for each course content or criterion.
3. Prepare the rubric. In developing rubric, the skills and competencies related to
essay writing should first be identified. These skills and competencies represent
the criteria. Then, performance benchmarks and point values are determined.
Performance marks can be numerical categories, but the most frequently used are
descriptors with corresponding rating scale.
Points Sample Performance Benchmarks
Values
1 Needs Beginning Novice Inadequate
Improvement
2 Satisfactory Developing Apprentice Developing
3 Good Accomplished Proficient Proficient
4 Exemplary Exceptional Distinguished Skilled
4. Evaluate essay anonymously. Checking essay should be done

anonymously. It is important that the rater does not identify whose paper
he/she is rating.
5. Score one essay question at a time. This is to ensure that the same
thinking and standards are applied for all learners in the class. The rater

should try to avoid any distraction or interruption when evaluating the

same item.
6. Be conscious of own biases when evaluating a paper. The rate should
not be affected by leaners’ handwriting style, length of responses, and
other factors. He/she should to the criteria included in the rubric when
evaluating the essay.
7. Review initial scores and comments before giving the final rating.
This is important especially for essays that were initially given a barely
passing or failing grade.
8. Get two or more raters for essays that are high-stake, such as those
used for admission, placement, or scholarship screening purposes. Final
grade will be the average of the all ratings given.
9. Write comments next to the learner’s responses to provide feedback on
how well one performed in the essay test.
Grading System in K to 12 Basic Education Program

In the Department of Education, grading and reporting system is
articulated in D.O. No. 8, series of 2015. As indicated, all grades shall be
based on the weighted raw scores of the students’ summative assessments.
The minimum grade in each learning area shall be 60, which is transmuted to
75 in the report card. Learners from Grades 1 to 12 are graded in Written
Works, Performance Tasks and Quarterly Assessments. These three are
given specific percentage weights that vary according to the nature of the
specific learning area. The final grade is the summation of the student’s works
in these 3 components.
As a process, the raw score in each component is converted into
percentage score (PS) through this formula:
The PS is then transformed into a Weighted Score by multiplying it to

its weight depending on the learning area such as shown below:

Illustrative Example:
Assuming that a student has obtained the following raw scores in the different
components in English subject:
Components Total Score Total Possible Score
Written Works 145 160
Performance Tasks 100 120
Quarterly Assessment 50 50
Now, let us compute the grade.

How is the learner’s progress reported?

The summary of learner progress is presented quarterly to the parents
and guardians through a parent-teacher conference in which the report card is
discussed. The grading scale with its corresponding descriptors is shown
below. Remarks, however, are given at the end of the grade level.

Summary
In this lesson, we were able to discuss exhaustively the purpose of grading
and communicating learners’ test performance, the various methods in
marking or scoring tests and even performance tasks, the different methods in
grading learners’ performance in assessments, types of test scores, general
guidelines in grading tests or performance tasks, general guidelines in scoring
essay tests, and how test results be communicated. Finally, the guidelines on
classroom assessment of the DepEd K to 12 Basic Education Program were
likewise highlighted.
Enrichment
1. Read the following articles:
1. Magno, C. (2010). The Functions of Grading. The Assessment
Handbook, Vol. 3.
2. Guskey, T. R.(2001). Grading and Reporting Student Learning. Corwin
Press: KY, USA.
3. Brookhart, Susan M. (2013). How to Create and Use Rubrics for
formative assessment and grading. Virginia, USA: ASCD.
2. Watch this video:
Nancy Heilbronner (2019, April 2). Grading and Reporting. [Video].
YouTube: https://www.youtube.com/watch?v=SHBQTbymAP4
Assessment
A. Let us review what you have learned about grading and communicating
test results.
1. What are the purpose of grading and communicating learners’ test
performance?
2. What are the different methods in marking or scoring tests or
performance tasks?
3. What are the different methods in grading learners’ performance in

assessments?
4. What are the different types of test scores?
5. What are the general guidelines in grading tests or performance tasks?
6. What are the general guidelines in scoring essay tests?
7. How should test results be communicated?
B. After the discussion on grading and reporting test scores, you are now
ready to identify what methods of scoring/grading and types of scores that
you can employ in your assessments. Let us apply what you have learned
by extending the assessment plan that you have developed in earlier
lesson, or you may consider anew. In additional to the desired learning
outcomes, course topic, and test formats that you have listed down for
each subject, please identify the methods of scoring, types of grades, and
reporting strategies that you will employ.
For example: Basic Economics Assessment Plan

Desired Course Topic Assessment Method of Types of Test
learning Method Scoring Score
outcomes
Demonstrate Definition of Written Tests Number Percentage
understanding demand and Right scores
of the concept of supply, Negative
demand and shortage, Marking
supply surplus, and
market
equilibrium
Effects of
change of
demand and
supply on
market price
Apply the Exchange Rate Performance Holistic Letter
concepts of Change in Test Scoring Grades
demand and Price of Goods Analytic Pass or Fail
supply in actual in Market Scoring
cases Price Ceiling Primary Trait
and Price floor Scoring
Multiple Trait
Scoring

Now, use this template for your own sample plan.

Desired Course Topic Assessment Method of Types of Test
learning Method Scoring Score
outcomes
C. Let us then come up with a grading and reporting scheme for each type of
assessment that you will employ in each of your subjects. In the
development of the grading and reporting scheme, you need the following
information:
1. Purpose of Assessment: Why is this assessment being conducted?
Is it for learners’ monitoring and improvement (formative), or is it for
demonstrating student achievement (summative)?
2. Desired Learning Outcomes for the Topic/Subject Area: What are
the learning outcomes expected from the learners for this unit/subject?
3. Type of Assessment: How will each outcome be measured?
4. Grading Criteria: What are the criteria to include that demonstrate
achievement of the stated desired learning outcomes?
5. Scoring/Grading Method: How will be the test/performance tasks be
scored?
6. Type of Score: What types of scores that are appropriate to indicate
the students’ level of achievement or performance?
In the development of the grading and reporting schemes, you should

take into consideration the guidelines on grading and reporting test or
performance tasks results. Please refer to the next table for a sample
grading and reporting schemes for final quiz.

For example: Final Grade for Entrepreneurship Course

Purpose of Desired Types of Test Grading Scoring Type of
assessment Learning Assessment Criteria Method Score
Outcomes
Determine The learner Written Test Remembering Number Percentage
learners’ can present (Multiple- Understanding Right score
understanding of alone or with choice)
concepts, his/her Written Test Content Holistic Letter
underlying classmates (Essay) Organization Scoring Grades
principles, and an Language
processes of acceptable usage Support
developing a detailed Performance Market Analytic Categorical
business plan business Assessment- Analysis Scoring Grades
plan. Product Competitive
(Research Analysis
Report on the Marketing
Development Strategy
of Business Administrative
Plan) Personnel;
Critical Risks
Financial Data
and
Projections
Timeliness of
submission
Grammar and
spelling
Essay Content Holistic
Organization Scoring
Language
Usage support
D. Evaluate the sample grading and reporting scheme that you have developed for
each assessment by using the rubric.
Criteria Inadequate (1) Developing (2) Proficient (3)
Purpose of the test The purpose of The purpose of The purpose of
testing is not testing is specified; testing is clearly
specified in the however, it is not specified and
grading and reporting clear or relevant to relevant to the
system suggested for the grading and grading and reporting
the subject area reporting system system suggested for
covered. suggested for the the subject area
subject area covered. covered.
Identification of The intended learning The intended learning The intended learning
intended Learning outcomes in the outcomes are listed outcomes in the
Outcomes unit/topic/course are but they are not unit/topic/course are
not identified and clearly described. explicitly specified.
specified in the
grading and reporting
scheme.
Types of Tests The tests by which The tests are The tests will provide
students’ level of appropriate, but an adequate and
achievement are not they will not provide accurate measure of
valid and a complete and the extent to which
appropriate to valid measure of the learners have
measure extent to extent to which achieved the
which learners have learners have intended outcomes.

achieved the achieved the

intended outcomes. intended outcomes.
Grading Criteria No criterion is The same criteria or Different criteria or
included in the performance performance
grading and reporting standards are used standards are used
scheme. for all kinds of for different types of
learners’ test tests or performance
performance and tasks.
outputs.
Scoring Methods The scoring methods The scoring methods The scoring methods
used to grade are appropriate, but will provide an
learners’ level of they will not provide adequate and
achievement or a complete and accurate measures
performances are not valid measure of the of the extent to which
valid and extent to which learners have
appropriate. learners have achieved the
achieved the intended outcomes.
intended outcomes.
Types of Scores The types of score The types of score The types of scores
included in the are appropriate will serve as a
grading and reporting measures; however, concrete evidence
scheme will not they are not of the learners’ level
provide a valid adequate to assess of achievement or
measure of learners’ learners’ level of performance.
level of achievement achievement and
and performance. performance.
E. Let us evaluate your understanding about grading and reporting of test

results. Answer the following items below:
1. Identify the scoring methods used in the following cases:
a. Assigning positive values only to correct answer while giving a
score of zero for incorrect answers.
b. Negative marking c. Partial credit scoring
c. Number right d. standard setting
b. Using criteria or standards when scoring multiple-choice items,
particularly standards set through norm-referenced or criterion-
referenced assessment.
a. Negative marking c. Partial credit scoring
b. Number right d. standard setting
c. Giving a single, overall assessment score for a performance
task.
a. Analytic scoring c. multiple trait scoring
b. Holistic scoring d. standard setting
d. Focusing on only one aspect or criterion of a task.

b. Holistic scoring d. primary trait scoring

e. Assessing each aspect of a performance task and assigning a
score for each criterion.
b. Holistic scoring d. primary trait scoring
2. Identify the types of scores identified below.
a. Simply gives the number of items answered correctly on a test.
a. Percentile rank c. raw score
b. Percentage score d. standard scores
b. Tells you the percentage of scores that falls at or below your
score.
a. Percentile rank c. raw score
b. Percentage score d. standard scores
c. Compares the performance of a learner with those of his or her
peers.
a. Criterion-referenced c. norm-referenced
b. Letter grade d. pass or fail
F. Give three (3) main reasons why you need to assign or give grades to your
students’ test results, justify your answer. Your response to this question will be
evaluated using the holistic rubric below.
Criteria 1 Point 2 Points 3 Points
Knowledge/ Demonstrate no Demonstrate fair Demonstrates
Understanding or limited understanding of extensive knowledge
of concept understanding of the topic/concept and strong
the topic/concept understanding of the
topic/concept
Argument Makes an Makes an Makes an accurate
Conclusion inaccurate accurate but and complete
argument or uncompleted argument or
conclusion argument or conclusion
conclusion
Support Provides Provides Provides appropriate
inappropriate appropriate but and sufficient
and insufficient insufficient evidence or examples
examples of examples or to support argument
evidence to evidence to or conclusion
support the support argument
argument or or conclusion
conclusion
Explanation/ Does not provide Provides good Provides excellent
Reasoning explanation/ explanation or explanation or
justification to justification that reasoning that links

argument or links the examples to argument

conclusion argument/ or conclusion
conclusion and
examples
G. Evaluate your skills in identifying and using appropriate grading and reporting
techniques based on the following scale:
Level Performance Benchmark Using Different Using Different
Scoring Types of
Techniques Scores/Grades
Proficient I know them very well. I can 4 4
teach others where and when
to use them appropriately
Master I can do it by myself. Though, I 3 3
sometimes make mistakes.
Developing I am getting there. Though, I 2 2
still need help to be able to
perfect it.
Novice I cannot do it myself. I need 1 1
help to make an effective
grading and reporting scheme.
H. Based on your self-assessment above, choose the following tasks to help you
enhance your skills and competencies in developing different scoring and grading
techniques:
Proficient Help or mentor peers/classmates who are having difficulty in
understanding the different techniques in scoring and grading test
results.
Master Examine the areas that you need to improve on and address the
immediately.
Benchmarking with the scoring and grading scheme developed by
your peers/classmates who are known to be proficient in this area.
Developing/ Read more books/references about scoring techniques and types
Novice of scores. Ask your teacher to evaluate the grading and reporting
scheme that you have developed and to give suggestions on how
you can improve it.
Educator’s Input

I have extensive experience in the development of the various

standardized tests in our university as head of the testing unit for 21 years. I
have been involved in the development of test used for admission and
placement of our college and senior high school students, as well as tests
used for screening our employee applicants. I have also conceptualized and
develop scales and rubrics for performance evaluation of our personnel. For
the high-stake tests, we make use of norm-referenced scoring, particularly
percentile and stanine scores. These scores were developed with normative
group as references and for each academic program/college. For example,
norms for undergraduate and graduate school admission tests were based on
the scores of applicants to the different program offering or colleges.
Decisions on whether or not to accept or reject student applicants are based
on the University-set cut-off percentile rank score. On the other hand,
answers to the essays are rated using the Pass/Fail scoring method. For
scales and rubrics, criterion-referenced scores are used.
As a faculty in the graduate school, I extensively use number right
scoring for selected-response tests, holistic scoring for essay tests, and
analytic scoring for performance tasks. For holistic and analytic scoring, I
employ rubrics. I have developed rubrics for final research output, peer
review, paper oral presentation, research proposal, and projects. The rubrics
help me in evaluating both the performance- and product-based outputs of my
students.
Grades are communicated to students during the grade consultation
day. The bases of the final grade, which I normally present to the students
during the first meeting, are again explained to the students. Students are
then given the opportunity to ask questions about their grades.
References
D.O. No. 8, s. 2015 (Policy Guidelines on Classroom Assessment for the K to
12 Basic Education Program)

du Plessis, S. (2017, November). 5 Reasons Why Grades Are Important.

Retrieved from https://www.edubloxtutor.com/5-reasons-grades-
important/
Marzano, Robert (September 2020). What Are Grades For? In Transforming
Classroom Grading. Retrieved from
http://www.ascd.org/publications/books/100053/chapters/What-Are-
Grades-For%C2%A2.aspx

Appendix A – Course Syllabus
Republic of the Philippines
ACCESS, EJC Montilla, 9800 City of Tacurong
Province of Sultan Kudarat
College of Teacher Education
First Semester, Academic Year 2020-2021
UNIVERSITY VISION UNIVERSITY OBJECTIVES
A trailblazer in arts, science and technology in the region. a. Enhance competency development, commitment, professionalism, unity and true spirit of
service for public accountability, transparency and delivery of quality services;
UNIVERSITY MISSION b. Provide relevant programs and professional trainings that will respond to the development
needs of the region;
The University shall primarily provide advanced instruction and professional training in c. Strengthen local and international collaborations and partnerships for borderless
science and technology, agriculture, fisheries, education and other related field of programs;
study. It shall undertake research and extension services, and provide progressive d. Develop a research culture among faculty and students;
leadership in its area of specialization. e. Develop and promote environmentally-sound and market-driven knowledge and
technologies at par with international standards;
UNIVERSITY GOAL f. Promote research-based information and technologies for sustainable development;
g. Enhance resource generation and mobilization to sustain financial viability of the
To produce graduates with excellence and dignity in arts, science and technology.
university.
Program Objectives and their relationships to University Objectives:
PROGRAM OBJECTIVES (PO) UNIVERSITY OBJECTIVES

A graduate of Bachelor of Elementary Education can: a b c d e f g
a. Articulate and discuss the latest developments in elementary education; / / / / / /
b. Effectively communicate in English and Filipino, both orally and in writing; / / /
c. Work effectively and collaboratively with a substantial degree of independence in multi-disciplinary and multi-cultural teams; / / / / / / /
d. Act in recognition of professional, social, and ethical responsibility; / / / /
e. Preserve and promote “Filipino historical and cultural heritage”; / / / /
f. Articulate the rootedness of education in philosophical, socio-cultural, historical, psychological, and political contexts; / /
g. Demonstrate mastery of subject matter/discipline; / /
h. Facilitate learning using a wide range of teaching methodologies and delivery modes appropriate to specific learners and their / / / / /
environments;
i. Develop innovative curricula, instructional plans, teaching approaches, and resources for diverse learners; / / / / / / /
j. Apply skills in the development and utilization of ICT to promote quality, relevant, and sustainable educational practices; / / / / /
k. Demonstrate a variety of thinking skills in planning, monitoring, assessing, and reporting learning processes and outcomes; / /
l. Practice professional and ethical teaching standards sensitive to the local, national, and global realities; / / / / /
m. Pursue lifelong learning for personal and professional growth through varied experiential and field-based opportunities; / / / /
n. Demonstrate in-depth understanding of the diversity of learners in various learning areas; / / / /
o. Manifest meaningful and comprehensive pedagogical content knowledge (PCK) of the different subject areas; / / / /
p. Utilize appropriate assessment and evaluation tools to measure learning outcomes; / / /
q. Manifest skills in communication, higher-order thinking skills, and use of tools and technology to accelerate learning and teaching; / / /
r. Demonstrate positive attitudes of a model teacher, both as an individual and as a professional; and / /
s. Manifest a desire to continuously pursue personal and professional development. / /
1. Course Code : ProfEd 606 5. Course Description:

2. Course Title : Assessment of Learning 1
3. Prerequisite : None This course for pre-service teachers that focuses on the principles, development and utilization of
4. Credits : 3 Units conventional tools to improve the teaching-learning process. It emphasizes on the use of assessment of, as
and for in measuring knowledge, comprehension, and other thinking skills in the cognitive, psychomotor, or
affective domains. It allows pre-service teachers to go through the standard steps in test construction, and
development and the application in grading system. .
6. Course Learning Outcomes and its Relationships to Program Objectives
COURSE LEARNING OUTCOMES PROGRAM OBJECTIVES

At the end of the semester, the students can: a b c d e f g h i j k l m n o p q r s
a. Demonstrate knowledge and understanding of the course syllabus and basic academic policies / /
b. Discuss outcomes-based education as a concept / / / /
c. Explain the fundamental concepts and principles of assessment in learning / / / /
d. Formulate the learning objectives and targets considering the purpose and methods of assessment / / / / /
e. Decide the right assessment tool based on the suitable form and use / / /
f. Plan out a written test through the use of a Table of Specifications (TOS) / / / /
g. Construct a test based on the learning objectives, outcomes/targets and guidelines of varied test formats / / / /
276
h. Enhance the quality of test through judgmental test-improvement and other empirically-based procedures / / / / / /
i. Ensure the validity and reliability of the constructed test / / / / / /
j. Organize the data derived from tests using tables and charts / / / / / / /
k. Use statistics to analyze, interpret and use test data in decision making / / / / / /
l. Observe the guidelines in test scoring and grading as well as its methods of reporting / / / / /
7. Course Contents
Course Objectives, Topics, Time Desired Student Learning Outcomes Outcomes-Based Assessment (OBA) Evidence of Course Program Values Integration
Allotment Activities Outcomes Learning Objectives
Outcomes
Lesson 0. Course Orientation (3 hours)
 Course Syllabus 1. Explain the vision and mission, and  Recite sincerely the University Oral Recitation a a, b Accountability,
 Basic academic policies significant academic policies of the Vision and Mission (OR) Excellence
University  Involvement in the G-class Class
2. Enumerate the course desired Participation
learning outcomes Rating (CPR)
3. Use the syllabus as reference for
independent learning
4. Simulate the computation of one’s
grades given the criteria
CHAPTER 1. OUTCOMES-BASED EDUCATION

Lesson 1. Understanding Outcomes-Based Education (3 hours)
 Meaning of Education 1. Discuss outcomes-based education,  Self-assessment as contained in Exercises/Quiz b b, g, h, p Honesty,
 What is OBE? its meaning, brief history and the last part of the module Scores (EQS) Transparency,
 Educational Landscape for characteristics  Involvement in the G-class Case Report Justice
Higher Education 2. Identify the procedures in the  Quiz Rating (CRR)
 The Outcomes of Education implementation of OBE in subjects Class
or courses Participation
3. Define outcomes and discuss each Rating (CPR)
type of outcomes
CHAPTER 2. INTRODUCTION TO ASSESSMENT IN LEARNING

Lesson 1. Basic Concepts and Principles in Assessing Learning (4.5 hours)
 Meaning of Assessment 1. Make a personal definition of  Sharing of personal experiences on Exercises/Quiz c b, g, h, p Honesty,
277
 Meaning of Learning assessment testing and grading practices of Scores (EQS) Transparency,
 Evaluation and 2. Compare assessment with past teachers through a case Case Report Justice
Measurement measurement and evaluation presentation Rating (CRR)
 Principles in Assessing 3. Discuss testing and grading  Quiz Class
Learning 4. Explain the different principles in  Self-assessment as contained in Participation
 Grading and Testing assessing learning the last part of the module Rating (CPR)
5. Relate an experience as a student or  Involvement in the G-class
pupil related to each principle
6. Comment on the tests administered
by the past teachers
7. Perform simple evaluation
Lesson 2. Assessment Purposes, Learning Objectives/Targets and Appropriate Methods (4.5 hours)
 Purpose of Classroom 1. Articulate the purpose of classroom  Completion of Table of Learning Exercises/Quiz d b, g, h, p, r Objectivity,
Assessment assessment Objectives/targets Scores (EQS) Justice,
 Bloom’s Taxonomy of 2. Tell the difference between the  Presentation of matrix of learning Case Report Truthfulness
Educational Objectives Bloom’s Taxonomy and the Revised targets and methods of assessment Rating (CRR)
 Learning Objectives Bloom’s Taxonomy in stating  Quiz Class
 Learning Targets learning objectives  Self-assessment as contained in Participation
 Matching Appropriate 3. Apply the Revised Bloom’s the last part of the module Rating (CPR)
Assessment Methods Taxonomy in writing learning  Involvement in the G-class
objectives
4. Discuss the importance of learning
targets in instruction
5. Formulate learning targets
6. Match the assessment methods with
specific learning objectives/targets
Lesson 3. Different Classifications of Assessment (6 hours)

 Educational test vs. 1. Compare the following forms of  Completed table of learning Exercises/Quiz e b, g. p Fairness,
Psychological test assessment: educational vs. objectives/targets Scores (EQS) Respect,
 Teacher-made test vs. psychological, teacher-made vs.  Completed matrix of learning Class Accountability
Standardized test standardized, selected-response vs. targets and methods of assessment Participation
 Constructed-response test constructed-response, achievement  Quiz Rating (CPR)
vs. Selected-response test vs. aptitude, and power vs. speed  Self-assessment as contained in
 Achievement test vs. 2. Give examples of each classification the last part of the module
Aptitude test of test  Involvement in the G-class
278
 Power test vs. speed test 3. Illustrate situations in the use of

different classifications of
assessment
4. Decide on the kind of assessment to
be used
CHAPTER 3. DEVELOPMENT AND ENHANCEMENT OF TESTS
Lesson 1. Planning a Written Test (6 hours)
 Planning a Test 1. Define the necessary instructional  Completed Table of Specifications Exercises/Quiz f b, g, h, p Excellence,
 Defining the Test Objectives outcomes to be included in a written  Quiz Scores (EQS) Perseverance,
or Learning Outcomes for test  Oral Recitation Checklist Rating Honesty
Assessment 2. Describe what is a table of  Self-assessment as contained in (CLR)
 Objectives for Testing specifications (TOS) and its formats the last part of the module Class
 Table of Specifications 3. Prepare a TOS for a written test  Involvement in the G-class Participation
 General Steps in Developing 4. Demonstrate the systematic steps in Rating (CPR)
a Table of Specifications making a TOS
 Different Formats of a Table
of Specifications
Lesson 2. Construction of Written Tests (6 hours)

 Constructing various Types 1. Describe the characteristics of  Constructed written test in varied (OR) g b, g, j, p Justice, Respect,
of Traditional Test Formats selected-response and constructed- formats Exercises/Quiz Hard work,
 General Guidelines in the response tests  Quiz Scores (EQS) Responsibility
Selection of Appropriate 2. Classify whether a test is selected-  Oral Recitation Class
Test Format response or constructed-response  Self-assessment as contained in Participation
 Categories and Formats of 3. Identify the test format that is most the last part of the module Rating (CPR)
Traditional Tests appropriate to a particular learning  Participation in the G-class
 General Guidelines in Test outcome/target
Item Construction (Multiple- 4. Apply the general guidelines in
Choice, Matching-type, constructing test items
True-False, Short-answer, 5. Prepare a written test based on the
Essay Tests, Problem- prepared TOS
solving) 6. Evaluate a given teacher-made test
based on guidelines
Lesson 3. Improving a Classroom-Based Assessment (3 hours)

 Judgmental item- 1. List down the different ways for  Teachers and Peer Review results Exercises/Quiz h b, g, h, n, p, Integrity, Justice,
279
improvement (Teacher’s judgmental item-improvement and  Item Analysis Results (Difficulty Scores (EQS) r, t Objectivity
own Review, Peer Review, other empirically-based procedures Index and Index of Discrimination) Checklist Rating
Student Review) 2. Evaluate which type of test item-  Quiz (CLR)
 Other Empirically-based improvement is appropriate to use  Oral Recitation Class
Procedures (Difficulty Index, 3. Compute and interpret the results for  Involvement in the G-class Participation
Index of Discrimination, index of difficulty, index of Rating (CPR)
Distracter Analysis) discrimination and distracter
efficiency
4. Demonstrate knowledge on the
procedures for improving a
classroom-based assessment
Lesson 4. Establishing Test Validity and Reliability (6 hours)

 Validity Test (Content, Face, 1. Explain the different tests of validity  Validity and Reliability Test results Exercises/Quiz i b, g,h, n, p, Integrity, Justice,
Predictive, Concurrent, 2. Identify the most practical test to  Quiz Scores (EQS) r, t Objectivity
Construct, Convergent, apply when validating a typical  Oral Recitation Checklist Rating
Divergent) teacher-made assessment  Self-assessment as contained in (CLR)
 Reliability Test (Test-retest, 3. Tell when to use a certain type of the last part of the module Class
Parallel forms, Split-half, reliability test  Involvement in the G-class Participation
Internal consistency, Inter- 4. Apply the suitable method of Rating (CPR)
rater) reliability test given a set of
assessment results/test data
5. Decide whether a test is valid or
reliable
CHAPTER 4. ORGANIZATION, UTILIZATION, AND COMMUNICATION OF TEST RESULTS

Lesson 1. Organization of Test Data Using Tables and Graphs (6 hours)
 Frequency Distribution 1. Organize the raw data from a test  Completed frequency distribution Exercises/Quiz j a, b, g, j, n, Simplicity, Order,
 Cumulative Frequency 2. Construct a frequency distribution table Scores (EQS) p, r, t Fairness
Distribution 3. Acquire knowledge on the basic  Completed tables and graphs Checklist Rating
 Determining the Midpoint of rules in preparing tables and graphs  Self-assessment as contained in (CLR)
the Class Intervals 4. Summarize test data using the last part of the module Class
 Using Excel Chart Wizard appropriate table or graph  Demonstration of steps in the use Participation
 Graphic Representation of 5. Use Microsoft Excel to construct of Excel Chart Wizard Rating (CPR)
Data appropriate graphs for a data set  Oral Recitation
 Skewness and Kurtosis 6. Interpret the graph of a frequency  Involvement in the G-class
and cumulative frequency
280
distribution
7. Characterize a frequency distribution
graph in terms of skewness and
kurtosis
Lesson 2. Analysis, Interpretation, and Use of Test Data (6 hours)

 Measures of Central 1. Find the mean, median, and mode of  Self-assessment as contained in Exercises/Quiz k a, g, i, j, p, r, Fairness, Charity,
Tendency test score distribution the last part of the module Scores (EQS) t Equality,
 Application of Excel 2. Determine the different measures of  Computation results Checklist Rating Accuracy
 Scale of Measurement dispersion of test scores  Oral Recitation (CLR)
 Measures of Dispersion 3. Calculate the measure of position  Quiz Class
 Measures of Position 4. Relate standard deviation and  Involvement in the G-class Participation
normal distribution Rating (CPR)
5. Transform raw scores to
standardized scores (z, T and
stanine)
6. Compute the measure of
covariability using the long process
and Excel
7. Interpret test data applying
measures of central tendency,
variability, position, and covariability
Lesson 3. Grading and Reporting of Test Results (6 hours)
 Grading and Reporting 1. Define what is grading  Self-assessment as contained in Exercises/Quiz l a, b, g, j, s, t Respect,
 Purposes of Grading and 2. Discuss the different methods in the last part of the module Scores (EQS) Truthfulness,
Reporting Learners’ Test scoring tests and performance tasks;  Scoring Rubrics Checklist Rating Justice
Performance  Oral Recitation (CLR)
different types of test scores;
 Methods in Scoring Tests or  Quiz Class
guidelines on grading tests and Participation
Performance Tasks  Participation/Involvement in the G-
performance test scores; and how to Rating (CPR)
 Types of Test Scores class
communicate test scores
 Guidelines in Grading Tests
or Performance Tasks 3. Prepare scoring rubrics for
 Grading System in K to 12 performance tasks
Basic Education Program 4. Discuss the assessment system in
the Department of Education as
contained in DO No. 8, s. 2015
281
5. Compute the grades of learners

based on DepEd guidelines
8. Course Evaluation
Course Requirements The following are the course requirements: (a) Examinations (Midterm and Final); (b) Quizzes/Exercises; and, (c) Class Participation/involvement
Course Policies All students must adhere to these class guidelines: (a) act politely, responsibly and with maturity; (b) arrive on time and be ready for instruction; (c) set cell phones
in silent mode and keep them inside the bags; (d) contribute to an orderly learning environment; (e) consult the professor when deemed necessary; (f) establish
good rapport with professors; (g) maintain silence during oral reports/presentations; and, (h) cooperate in classroom activities or any task performances.
Grading System Midterm Grade Final Term Grade

Midterm Examination (50%); Quizzes/Assignments (30%); Midterm Examination (50%); Quizzes/Assignments (30%); Participation/Attendance
Participation/Attendance (20%) (20%)
Schedule of Examination October 11-12, 2020* December 11-13, 2020*
*tentative
References
Book Andrade, H. (2010). Students as the definitive source of formative assessment: Academic self-assessment and the self-regulation of learning. In H. Andrade & G. Cizek (Eds.),
Handbook of formative assessment (pp. 90–105). New York, NY: Routledge.
Brookhart, Susan M. (2013). How to Create and Use Rubrics for formative assessment and grading. Virginia, USA: ASCD.
David el al. (2020). Assessment in Learning 1. Manila: Rex Book Store.
De Guzman, E. and Adamos, J. (2015). Assessment of Learning 1. Quezon City: Adriana Publishing Co., Inc.
Fives, H. & DiDonato-Barnes, N. (February 2013). Classroom Test Construction: The Power of a Table of Specifications. Practical Assessment, Research & Evaluation, Volume 18, (3).
Hattie, John. Visible Learning for Teachers: Maximizing Impact on Learning. New York: Routledge, 2012.
Klenowski, V. (1995). Student self-evaluation processes in student-centred teaching and learning contexts of Australia and England. Assessment in Education: Principles, Policy &
Practice, 2(2).
Macayan, J. (2017). Implementing Outcome-Based Education (OBE) Framework: Implications for Assessment of Students’ Performance. Educational Measurement and Evaluation
282
Review, Vol. 8 (1, 1-10).

Magno, C. (2011). A Closer Look at other Taxonomies of Learning: A Guide for Assessing Student Learning. The Assessment Handbook, Vol. 5.
_______ (2010). The Functions of Grading Students. The Assessment Handbook, 3, 50-58.
Maxwell, Graham S. (2001). Teacher Observation in Student Assessment. (Discussion Paper). The University of Queensland.
McMillan, J. and Hearn, J. (2008). Student Self-Assessment. Educational Horizons. Retrieved from https://eric.ed.gov/?id=EJ815370.
Moss, Connie and Susan Brookhart. Learning Targets: Helping Students Aim for Understanding in Today’s Lesson. Alexandria: ASCD, 2012.
Navarro, L., Santos, R. and Corpuz, B. (2017). Assessment of Learning 1 (3rd ed.). Quezon City: Lorimar Publishing, Inc.
Online Alberta Education (2008, October 1). Types of Classroom Assessment. Retrieved from http://www.learnalberta.ca/content/mewa/html/assessment/types.html
Aptitude Tests. Retrieved from https://www.aptitude-test.com/aptitude-tests.html
Armstrong, P. (2020). Bloom’s Taxonomy. TN: Vanderbilt University Center for Teaching. Retrieved from https://cft.vanderbilt.edu/guides-sub-pages/blooms-taxonomy/ .
Cherry, Kendra (2020, February 06). How Achievement Tests Measure What People Have Learned. Retrieved from https://www.verywellmind.com/what-is-an-achievement-test-
2794805
Classroom Assessment. Retrieved from https://fcit.usf.edu/assessment/selected/responseb.html
Clayton, Heather. “Power Standards: Focusing on the Essential.” Making the Standards Come Alive! Alexandria, VA: Just ASK Publications, 2016. Access at
www.justaskpublications.com/just-ask-resource-center/e-newsletters/msca/power-standards/
EL Education (2020). Students Unpack a Learning Target and Discuss Academic Vocabulary. [Video]. https://vimeo.com/44052219
Fisher, M. Jr. R. (2020). Student Assessment in Teaching and Learning. Retrieved from https://cft.vanderbilt.edu/student-assessment-in-teaching-and-learning/
Improving your Test Questions. https://citl.illinois.edu/citl-101/measurement-evaluation/exam-scoring/improving-your-test-questions?src=cte-migration-map&url=%2Ftesting%2Fexam
%2Ftest_ques.html
Isaacs, Geoff (1996). Bloom’s Taxonomy of Educational Objectives. The University of Queensland: TEDI. Retrieved from https://kaneb.nd.edu/assets/137952/bloom.pdf
Kurt, Serhat. (2019, April 24). Using Bloom’s Taxonomy to Write Effective Learning Objectives: The ABCD Approach. Retrieved from https://educationaltechnology.net/using-blooms-
taxonomy-to-write-effective-learning-objectives-the-abcd-approach/
LSI (2018, November 10). 3 Types of Learning Targets. An excerpt from Creating & Using Learning Targets & Performance Scales: How Teachers Make Better Instructional Decisions,
by Carla Moore, Libby H. Garst, and Robert J. Marzano. Retrieved from https://www.marzanocenter.com/3-types-of-learning-targets/
Phelan, C. and Wren, J. (2006). Exploring Reliability in Classroom Assessmnet. Retrieved from https://chfasoa.uni.edu/reliabilityandvalidity.htm
Shabatura, J. (2013, September 27}. Using Bloom’s Taxonomy to Write Effective Learning Objectives. Retrieved from https://tips.uark.edu/using-blooms-taxonomy/
The Graide Network (2018, September 10). Importance of Validity and Reliability in Classroom Assessments. https://www.thegraidenetwork.com/blog-all/2018/8/1/the-two-keys-to-
University of Lethbridge (2020). Creating Assessments. Retrieved from https://www.uleth.ca/teachingcentre/exams-and-assignments
Rubric for Evaluation of Class Participation/Involvement Performance
Criteria Inadequate Developing but below Accomplished/ Meets Expectations Exemplary/Displays leadership Score
(0 point) expectations (2 points) (3 points)
(1 point)
Level of Engagement Student never contributes to class Few contributions to class Proactively contributes to class Proactively and regularly contributes to
283
and active participation discussion; fails to respond to direct discussion; Seldom volunteers but discussion, asking questions and class discussion; Initiates discussion on
questions responds to direct questions respond to direct questions issues related to class topics
Listening Skills Does not listen when others talk, Does not listen carefully and Listens and appropriately responds Listens without interrupting and
interrupts, or makes inappropriate comments are often non- to the contributions of others incorporates and expands on the
comments responsive to discussion contributions of other students
Relevance of Contributions , when made, are off- Contributions are sometimes off- Contributions are always relevant Contributions are relevant and promote
Contribution to topic topic or distract class from topic or distracting deeper analysis of the topic
under discussion discussion
Preparation Student is not adequately prepared; Student has read the material but Student has read and thought Student is consistently well prepared;
Does not appear to have read the not closely or has read only some about the material in advance of Frequently raises questions or
material in advance of class of the assigned material in class; comments on material outside
advance of class
Case Study Grading Rubric
Each item is rated on the following rubric. [1= Very poor; 2 = Poor; 3 = Adequate; 4 = Good; 5 = Excellent]
Item Scores
1. Evidence of preparation (organized presentation, presentation/discussion flows well, no awkward pauses or confusion from the group/individual, 1 2 3 4 5
evidence you did your homework)
2. Content (group/individual presented accurate & relevant information, appeared knowledgeable about the case studies assigned and the topic 1 2 3 4 5
discussed, offered strategies for dealing with the problems identified in the case studies)
3. Enthusiasm/Audience Awareness (demonstrates strong enthusiasm about topic during entire presentation; significantly increases audience 1 2 3 4 5
understanding and knowledge of topic; convinces an audience to recognize the validity and importance of the subject)
4. Delivery (clear and logical organization, effective introduction and conclusion, creativity, transition between speakers, oral communication skills - eye 1 2 3 4 5
contact)
5. Discussion (group/individual initiates and maintains class discussion concerning assigned case studies, use of visual aids, good use of time, 1 2 3 4 5
involves classmates)
Prepared by: Reviewed by: Approved by:
ERNIE C. CERADO, PhD ANESA P. MANGINDRA, PhD NANCY B. ESPACIO, EdD

284
Professor BEED Chairperson Dean, College of Teacher Education
285

Modules in Assessment in Learning 1 For PRI

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Modules in Assessment in Learning 1 For PRI

Uploaded by

Copyright:

Available Formats

SULTAN KUDARAT STATE UNIVERSITY

Ernie C. Cerado, PhD

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE i

With wide-ranging challenges brought about by the pandemic in

Most importantly, the students are requested to maximize the

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE ii

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE iii

Lesson 1: Understanding Outcomes-Based Education

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 1

present a sample educational objectives and learning outcomes in K to 12

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 2

 CHED subscribes to a weak or lower case due to the realities of the

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 3

learning as a journey by itself. He acknowledged that all students can learn

Elevating the Educational Landscape for Higher Education

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 4

The Outcomes of Education

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 5

 …CAN read and demonstrate good comprehension of text in areas of

It is grounded on the principles of: clarity of focus of significance,

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 6

learning context should challenge students enough to activate and enable

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 7

However, in a traditional education system and economy, students are

 Mathematical problem-solving skills

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 10

Secure a copy of CHED Memorandum Order No. 46, s. 2012 re “Policy

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 11

Activity 2. Research the nature of education and be able to submit/present

____4. A good source/ of subject matter statement/ is Benjamin Bloom’s/

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 12

___9. Affective, refers to mental skills/ such as remembering,/ understanding,

___10. Immediate outcome is the ability/ to apply cognitive, psychomotor, and

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 13

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 14

10. Clarity of focus

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 15

Lesson 1: Basic Concepts and Principles in Assessment

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 16

1. Assessment involves the use of empirical data on student learning to

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 17

mechanism that educators should use to enhance their teaching practices.

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 18

other words, it is used to describe a product, a process, or a function.

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 19

Measurement and Evaluation

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 20

test is the primary instrument in the common practice of educators. They

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 21

decision. Therefore, grading of students in schools must be credible to ensure

Testing and Grading

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 22

scoring 36 in a 50-item midterm examination. He also received a passing

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 23

discrimination indices, IRT analysis can provide significantly more information

necessary to take assessment. Your goal with confirmative assessments

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 25

It measures the performance of a student against previous performances

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 26

learning, whether formative or summative, should lead to decision that

ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 27

 Assessment is a systematic process of defining, selecting, designing,