Professional Documents
Culture Documents
Modules in Assessment in Learning 1 For PRI
Modules in Assessment in Learning 1 For PRI
Assessment of Student
Learning 1
Preface
COVID-19 has affected the world at large, but this has
also given us a glimpse of the good that exists.
- Amit Gupta
Table of Contents
Foreword ii
Chapter 1 Outcomes-Based Education 1
Lesson 1 Understanding Outcomes-Based Education 1
Chapter 2 Introduction to Assessment in Learning 16
Lesson 1 Basic Concepts and Principles in Assessing 16
Learning
Lesson 2 Assessment Purposes, Educational Objectives, 32
Learning Targets and Appropriate Methods
Lesson 3 Classifications of Assessment 54
Chapter 3 Development and Enhancement of Tests 71
Lesson 1 Planning a Written Test 71
Lesson 2 Construction of Written Tests 90
Lesson 3 Improving a Classroom-Based Assessment 122
Lesson 4 Establishing Test Validity and Reliability 139
Chapter 4 Organization, Utilization, and Communication of 161
Test Results
Lesson 1 Organization of Test Data Using Tables and 162
Graphs
Lesson 2 Analysis, Interpretation, and Use of Test Data 191
Lesson 3 Grading and Reporting of Test Results 240
Appendix 1 Course Syllabus 278
CHAPTER 1
OUTCOMES-BASED EDUCATION
Overview
In response to the need for standardization of education systems and
processes, many higher education institutions in the Philippines shifted
attention and efforts toward implementing OBE system on school level. The
shift to OBE has been propelled predominantly because it is used as a
framework by international and local academic accreditation bodies in school-
and program-level accreditation, on which many schools invest their efforts
into. The Commission on Higher Education (CHED) even emphasized the
need for the implementation of OBE by issuing a memorandum order on the
“Policy Standard to enhance quality assurance in Philippine Higher Education
through an Outcomes-Based and Typology Based QA”. Consequently, a
Handbook of Typology, Outcomes-Based Education, and Sustainability
Assessment was released in 2014.
Given the current status of OBE in the country, this lesson aims to
shed light on some critical aspects of the framework with the hope of
elucidating important concepts that will ensure proper implementation of OBE.
Also, it zeroes in inferring implications of OBE implementation for assessment
and evaluation of students‟ performance.
Objective
Upon completion of this chapter, the students can achieve a good
grasp of outcomes-based education.
Pre-discussion
Primarily, this chapter will deal with the shift of educational focus from
content to learning outcomes particularly on the OBE: matching intentions
with the outcomes of education. The students can state and discuss the
change of educational focus from content to learning outcomes. They can
What to Expect?
At the end of the lesson, the students can:
1. discuss outcomes-based education, its meaning, brief history and
characteristics;
2. identify the procedures in the implementation of OBE in subjects or
courses; and
3. define outcomes and discuss each type of outcomes.
Meaning of Education
According to some learned people the word education has been
derived from the Latin term “educatum” which means the act of teaching or
training. Other groups of educationalists say that it has come from another
Latin word “educare” which means to bring up or to raise. For a few others,
the word education has originated from another Latin word “educere” which
means to lead forth or to come out. All these meanings indicate that education
seeks to nourish the good qualities in man and draw out the best in every
individual; it seeks to develop the inner, innate capacities of man. By
educating an individual, we attempt to give him/her the knowledge, skills,
understanding, interests, attitudes, and critical thinking. That is, he/she
acquires knowledge of history, geography, arithmetic, language, and science.
Today, outcome-based education is the main thrust of the Higher
Education Institutions in the Philippines. The OBE comes in the form of
competency-based learning standards and outcomes-based quality
assurance monitoring and evaluating spelled out under the CHED
Memorandum Order No. 46. Accordingly, CHED OBE is different from
Transformational OBE on the following aspects:
The CMO acknowledges that there are 2 different OBE frameworks,
namely: the strong and the weak.
What is OBE?
Outcomes-Based Education (OBE) is a process that involves the
restructuring of curriculum, assessment and reporting practices in education
to reflect the achievement of high order learning and mastery rather than the
accumulation of course credits. It is a recurring education reform model, a
student-centered learning philosophy that focuses on empirically measuring
student’s performance, which are called outcomes and on the resources that
are available to students, which are called inputs.
Furthermore, Outcome-Based Education means clearly focusing and
organizing everything in an educational system around what is essential for all
students to be able to do successfully at the end of their learning experiences.
This means starting with a clear picture of what is important for students to be
able to do, then organizing the curriculum, instruction, and assessment to
make sure that this learning ultimately happens.
For education stalwart Dr. William Spady, Outcome-Based Education
(OBE) is a paradigm shift in the education system that’s changing the way
students learn, teachers think and schools measure excellence and success.
He came to the Philippines to introduce OBE in order to share the benefits of
OBE. Spady said in conceptualizing OBE in 1968, he observed the US
education system was more bent on how to make them achieve good scores.
“So there are graduates who pass exams, but lack skills. Then there are those
who can do the job well yet are not classic textbook learners.” Furthermore,
he said that OBE is also more concerned not with one standard for assessing
the success rate of an individual. “In OBE, real outcomes take us far beyond
the paper-and-pencil test.” An OBE-oriented learner thinks of the process of
Philippines, learning materials are aligned with OBE through the following
features:
Learning Objectives - Statements that describe what learners/students are
expected to develop by the time they finish a particular chapter. This may
include the cognitive, psychomotor, and affective aspects of learning.
Teaching Suggestions - This section covers ideas, activities, and strategies
that are related to the topic and will help the instructor in achieving the
Learning Objectives.
Chapter Outline - This section shows the different topics/subtopics found in
each chapter of the textbook.
Discussion Questions - This section contains end-of-chapter questions that
will require students to use their critical thinking skills to analyze the
factual knowledge of the content and its application to actual human
experiences.
Experiential Learning Activities - This includes activities that are flexible in
nature. This may include classroom/field/research activities, simulation
exercises, and actual experiences in real-life situations.
Objective type of tests to test knowledge of students may include any of the
following:
- Identification
- True or False
- Fill in the blank
- Matching type
- Multiple Choice
Answer Keys to the test questions must be provided*
Assessment for Learning - This may include rubrics that will describe and
evaluate the level of performance/expected outcomes of the learners.
Summary
The change in educational perspective is called Outcomes-Based
Education (OBE) which is characterized with the following:
It is student-centered; that is, it places the students at the center of the
process by focusing on Student Learning Outcome (SLO).
It is faculty driven; that is, it encourages faculty responsibility for
teaching, assessing program outcomes, and motivating participation
from the students.
It is meaningful; that is, it provides data to guide the teacher in making
valid and continuing improvement in instruction and other assessment
activities.
To implement OBE on the subject or the course, the teacher should
identify the educational objectives of the subject course so that he/she can
help students develop and enhance their knowledge, skills, and attitudes;
ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 9
SULTAN KUDARAT STATE UNIVERSITY
he/she must list down all learning outcomes specified for each subject or the
course objectives. A good source of learning outcomes statements is the
taxonomy of educational objectives by Benjamin Bloom which is grouped into
three domains: the Cognitive, also called knowledge, refers to mental skills
such as remembering, understanding, applying, analyzing, evaluating,
synthesizing, creating; the Psychomotor, also referred to as skills, includes
manual or physical skills, which proceed from mental activities and range from
the simplest to the complex such as observing, imitating, practicing, adapting,
and innovating; the Affective, also known as the attitude, refers to growth in
feelings or emotions, from the simplest behavior to the most complex such as
receiving, responding, valuing, organizing, and internalizing.
The emphasis in an OBE education system is on measured outcomes
rather than "inputs," such as how many hours students spend in class, or
what textbooks are provided. Outcomes may include a range of skills and
knowledge. Generally, outcomes are expected to be concretely measurable,
that is, "Student can run 50 meters in less than one minute" instead of
"Student enjoys physical education class." A complete system of outcomes for
a subject area normally includes everything from mere recitation of fact
("Students will name three tragedies written by Shakespeare") to complex
analysis and interpretation ("Student will analyze the social context of a
Shakespearean tragedy in an essay"). Writing appropriate and measurable
outcomes can be very difficult, and the choice of specific outcomes is often a
source of local controversies.
Learning outcomes describe the measurable skills, abilities, knowledge
or values that students should be able to demonstrate as a result of a
completing a course. They are student-centered rather than teacher-centered,
in that they describe what the students will do, not what the instructor will
teach. They are not standalone statements. They must all relate to each other
and to the title of the unit and avoid repetition. Articulating learning outcomes
for students is part of good teaching. If you tell students what you expect them
to do, and give them practice in doing it, then there is a good chance that they
will be able to do it on a test or major assignment. That is to say, they will
have learned what you wanted them to know. If you do not tell them what they
will be expected to do, then they are left guessing what you want. If they
guess wrong, they will resent you for being tricky, obscure or punishing.
Finally, outcomes assessment procedures must also be drafted to
enable the teacher to determine the degree to which the students are
attaining the desired learning outcomes. It identifies for every outcome the
data that will be gathered which will guide the selection of the assessment
tools to be used and at what point assessment will be done.
Enrichment
Assessment
Activity 1. Fill up the matrix based from your findings of the Educational
Objectives (EO) and create your own Learning Outcomes (LO).
Activity 3. The following statements are incorrect. On the blank before each
number, write the letter of the section which makes the sentence wrong, and
on the blank after each number, re-write the wrong section to make the
sentence correct.
____1. Because of knowledge explanation/ brought about by the use of/
(a) (b)
computers in education/ the teacher ceased to be the sole source
(c) (d)
of knowledge.
_____________________________________________________________
______________________________________________________________
____2. At present, / the teacher is the giver of knowledge/ by assisting/in the
(a) (b) (c)
organization of facts and information.
(d)
______________________________________________________________
______________________________________________________________
_____3. The change of focus/ in instruction/ from outcomes to content/ is
(a) (b) (c)
known as Outcomes-Based Education.
(d)
______________________________________________________________
______________________________________________________________
______________________________________________________________
____5. Education comes/ from the Latin root/ “educare” or “educere”/ which
(a) (b) (c)
means to “pour in”.
(d)
______________________________________________________________
______________________________________________________________
____6. In the past,/ the focus/ of instruction/ was learning outcomes.
(a) (b) (c) (d)
______________________________________________________________
______________________________________________________________
____7. Ability to communicate/ in writing and speaking/ is an example/ of
(a) (b) (c)
deferred outcome.
(d)
______________________________________________________________
______________________________________________________________
___8. The content and the outcome/ are the two/ main elements/ of the
(a) (b) (c) (d)
educative process.
______________________________________________________________
______________________________________________________________
______________________________________________________________
Activity 4. Give the meaning of the following word or group of words. Write
your answers on the spaces provided for after each number.
1. Outcomes-Based Education
________________________________________________________
________________________________________________________
________________________________________________________
________________________________________________________
2. Immediate Outcome
________________________________________________________
________________________________________________________
________________________________________________________
3. Deferred Outcome
________________________________________________________
________________________________________________________
________________________________________________________
4. Educational Objective
________________________________________________________
________________________________________________________
________________________________________________________
5. Learning Outcome
________________________________________________________
________________________________________________________
________________________________________________________
6. Student-Centered Instruction
________________________________________________________
________________________________________________________
________________________________________________________
7. Content-Centered Instruction
________________________________________________________
________________________________________________________
________________________________________________________
8. Psychomotor Skill
________________________________________________________
________________________________________________________
________________________________________________________
9. Cognitive Skill
________________________________________________________
________________________________________________________
________________________________________________________
References
De Guzman, E. and Adamos, J. (2015). Assessment of Learning 1. Quezon
City: Adriana Publishing Co., Inc.
Macayan, Jonathan (2017).Implementing Outcome-Based Education (OBE)
Framework: Implications for Assessment of Students’ Performance.
Educational Measurement and Evaluation Review (2017), Vol. 8 (1).
Navarro, R., Santos, R. and Corpuz, B. (2017). Assessment of Learning I (3 rd.
ed.). Metro Manila: Lorimar Publishing, Inc.
CHAPTER 2
INTRODUCTION TO ASSESSMENT IN LEARNING
Overview
Clear understanding of the course on Assessment of Learning has to
begin with one’s complete awareness of the fundamental terms and
principles. Most importantly, a good grasp of the concepts like assessment,
learning, evaluation, measurement, testing and test is a requisite knowledge
for every pre-service teacher. Sufficient information of these pedagogic
elements would certainly heighten his or her confidence in teaching. The
principles behind assessment are similarly necessary to be studied as all
activities related to it must be properly grounded; otherwise, it is not sound
and meaningless. Objective, content, method, tool, criterion, recording,
procedure, feedback, and judgment are some significant factors that must be
considered to undertake quality assessment.
Objective
Upon completion of the unit, the students can discuss the fundamental
concepts, principles, purposes, roles and classifications of assessment, as
well as align the assessment methods to learning targets.
Pre-discussion
Study the picture in Figure 1.
Has this something to do with
assessment? What are your
comments?
What to Expect?
At the end of the lesson, the students can:
1. make a personal definition of assessment;
2. compare assessment with measurement and evaluation;
3. discuss testing and grading;
4. explain the different principles in assessing learning;
5. relate an experience as a student or pupil related to each principle;
6. comment on the tests administered by the past teachers; and
7. perform simple evaluation.
What is assessment?
Let us have some definitions of assessment from varied sources:
Meaning of Learning
We all know that the human brain is immensely complex and still
somewhat of a mystery. It follows then, that learning as a primary function of
the brain is appreciated in many different senses.
To provide you sufficient insights of the term, here are several manners
that learning can be described:
1. A change in human disposition or capability that persists over a period of
time and is not simply ascribable to processes of growth.” (From The
Conditions of Learning by Robert Gagne)
2. Learning is the relatively permanent change in a person’s knowledge or
behavior due to experience. This definition has three components: 1) the
duration of the change is long-term rather than short-term; 2) the locus of
the change is the content and structure of knowledge in memory or the
behavior of the learner; 3) the cause of the change is the learner’s
experience in the environment rather than fatigue, motivation, drugs,
physical condition or physiologic intervention. (From Learning in
Encyclopedia of Educational Research, Richard E. Mayer)
3. It has been suggested that the term learning defies precise definition
because it is put to multiple uses. Learning is used to refer to (1) the
acquisition and mastery of what is already known about something, (2) the
extension and clarification of meaning of one’s experience, or (3) an
organized, intentional process of testing ideas relevant to problems. In
Figure 2
Figure 3
Figure 4
You may be thinking that learning to bake cookies and learning
something like Chemistry are not the same at all. In a way, you are right
however, the information you get from assessing what you have learned is the
same. Brian used what he learned from each batch of cookies to improve the
next batch. You also learn from every homework assignment that you
complete, and in every quiz you take what you still need to study to know the
material.
Models in Assessment
The two most common psychometric theories that serve as frameworks
for assessment and measurement especially in the determination of the
psychometric characteristics of a measure (e.g., tests, scale) are the
classical test theory (CTP) and the item response theory (IRT).
The CTT, also known as the true score theory, explains that variations
in the performance of examinees’ on a given measure is due to variations in
their abilities. It assumes that an examinees’ observed score in a given
measure is the sum of the examinees’ true scores and some degree of error
in the measurement caused by some internal and external conditions. Hence,
the CTT also assumes that all measures are imperfect and the scores
obtained from a measure could differ from the true score (i.e., true ability of an
examinee).
The CTT provides an estimation of the item difficulty based on the
frequency of number of examinees who correctly answer a particular item;
items with a fewer number of examinees with correct answers are considered
more difficult. It also provides an estimation of item discrimination based on
the number of examinees with higher or lower ability to answer a particular
item. If an item is able to distinguish between examinees with higher ability
(i.e., higher total test score) and lower ability (i.e., lower total test score), then
an item is considered to have good discrimination. Test reliability can also be
estimated using approaches from CTT (e.g., Kuder-Richardson 20,
Cronbach’s alpha). Item analysis based on this theory has been the dominant
approach because of the simplicity of calculating the statistics (e.g., item
difficulty index, item discrimination index, item-total correlation).
The IRT, on the other hand, analyzes test items by estimating the
probability that an examinee answers an item correctly or incorrectly. One of
the central differences of IRT from CTT is that in IRT, it is assumed that the
characteristic of an item can be estimated independently of the characteristic
or ability of an examinee, and vice-versa. Aside from item difficulty and item
Types of Assessment
The most common types of assessment are diagnostic, formative and
summative, criterion-referenced and norm-referenced, traditional and
authentic. Other experts added ipsative and confirmative assessments.
Pre-assessment or diagnostic assessment
Before creating the instruction, it is necessary to know for what kind of
students you are creating the instruction. Your goal is to get to know your
student’s strengths, weaknesses and the skills and knowledge they
possess before taking the instruction. Based on the data you have
collected, you can create your instruction. Usually, a teacher conducts a
pre-test to diagnose the learners.
Formative assessment
Formative assessment is a continuous and several assessments done
during the instructional process for the purpose of improving teaching or
learning (Black & William, 2003).
Summative assessment
Summative assessments are quizzes, tests, exams, or other formal
evaluations of how much a student has learned throughout a subject. The
goal of this assessment is to get a grade that corresponds to a student’s
understanding of the class material as a whole, such as with a midterm or
cumulative final exam.
Confirmative assessment
When your instruction has
been implemented in your
classroom, it is still
ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 24
SULTAN KUDARAT STATE UNIVERSITY
Criterion-referenced assessment
It measures student’s
performances against a fixed
set of predetermined criteria or
learning standards (see Figure
6). It checks what students are
expected to know and be able
to do at a specific stage of
their education. Criterion-
referenced tests are used to
evaluate a specific body of
knowledge or skill set; it is a test to evaluate the curriculum taught in a
course. In practice, these assessments are designed to determine
whether students have mastered the material presented in a specific unit.
Each student’s performance is measured based on the subject matter
presented (what the student knows and what the student does not know).
Again, all students can get 100% if they have fully mastered the material.
Ipsative assessment
Principles of Assessment
There are many principles in the assessment in learning. Different
literature provides their unique list yet closely related set of principles of
assessment. According to David et al. (2020), the following may be
considered as core principles in assessing learning:
1. Assessment should have a clear purpose. The methods used in
collecting information should be based on this purpose. The
interpretation of the data collected should be aligned with the purpose
that has been set. This principle is congruent with the outcome-based
education (OBE) principles of clarity of focus and design down.
2. Assessment is not an end in itself. It serves as a means to enhance
student learning. It is not a simple recording or documentation of what
learners know and do not know. Collecting information about student
Summary
Assessment
1. What is assessment in learning? What is assessment in learning for you?
2. Differentiate the following:
2.1. Measurement and evaluation
2.2. Testing and grading
2.3. Formative and summative assessment
2.4. Classical test theory and Item response theory
3. Based on the principles that you have learned, make a simple plan on how you will
undertake your assessment with your future students. Consider 2 principles only.
Principles Plan for applying the principle in your classroom
assessment
1.
2.
Enrichment
Secure a copy of DepEd Order No. 8, s. 2015 on the Policy Guidelines on
Classroom Assessment for the K to 12 Basic Education Program. Study
the policies and be ready to clarify any provisions during G-class. You can
access the Order from this link: https://www.deped.gov.ph/2015/04/01/do-
8-s-2015-policy-guidelines-on-classroom-assessment-for-the-k-to-12-
basic-education-program/
Read DepEd Order No. 5, s. 2013 (Policy Guidelines on the
Implementation of the School Readiness Year-end Assessment (SReYA)
for Kindergarten. (Please access through
https://www.deped.gov.ph/2013/01/25/do-5-s-2013-policy-guidelines-on-
the-implementation-of-the-school-readiness-year-end-assessment-sreya-
for-kindergarten/).
Questions
1. What assessment is cited in the Order? What is the purpose of giving
such assessment?
2. How would you classify the assessment in terms of its nature? Justify.
3. What is the relevance of this assessment to students, parents and
teachers and the school?
References
Pre-discussion
To be able to achieve the intended learning outcomes of this lesson,
one is required to understand the basic concepts, theories and principles in
assessing the learning of students. Should these things are not yet cleared
and understood, it is advised that a thorough review be made of the previous
chapter.
What to Expect?
At the end of the lesson, the students can:
1. articulate the purpose of classroom assessment;
2. tell the difference between the Bloom’s Taxonomy and the Revised;
Bloom’s Taxonomy in stating learning objectives;
3. apply the Revised Bloom’s Taxonomy in writing learning objectives;
4. discuss the importance of learning targets in instruction;
5. formulate learning targets; and
6. match the assessment methods with specific learning
objectives/targets.
Assessment for Learning Assessment of Learning
(Formative Assessment) (Summative Assessment)
Checks learning to determine what to Checks what has been learned to
do next and then provides date.
suggestions of what to do - teaching
and learning are indistinguishable
from assessment.
Is designed to assist educators and Is designed for the information of
students in improving learning. those not directly involved in daily
learning and teaching (school
administration, parents, school board,
Alberta Education, post-secondary
institutions) in addition to educators
and students.
Is used continually by providing Is presented in a periodic report.
descriptive feedback.
Usually uses detailed, specific and Usually compiles data into a single
descriptive feedback - in a formal or number, score or mark as part of a
informal report. formal report.
Is not reported as part of an Is reported as part of an achievement
achievement grade. grade.
Usually focuses on improvement, Usually compares the student's
compared with the student's “previous learning either with other students'
best” (self-referenced, making learning (norm-referenced, making
learning more personal). learning highly competitive) or the
standard for a grade level (criterion-
referenced, making learning more
collaborative and individually focused).
Involves the student. Does not always involve the student.
Adapted from Ruth Sutton, unpublished document, 2001, in Alberta Assessment
Consortium, Refocus: Looking at Assessment for Learning (Edmonton, AB: Alberta
Assessment Consortium, 2003), p. 4.
class.
Analyze Breaking down analyze, calculate, Classify the following
information into examine, test, chemical elements
parts compare, differentiate, based on some
organize, and classify categories/areas.
Apply Applying the apply, employ, Solve the following
facts, rules, practice, relate, use, problems using the
concepts and implement, carry-out, different measures of
ideas in and solve central tendency.
another context
Understand Understanding describe, determine, Explain the causes
what the interpret, translate, of malnutrition in the
information paraphrase, and country.
means explain
Remember Recognizing identifying, list, name, Name the 7th
and recalling underline, recall, president of the
facts retrieve, locate Philippines
LEARNING TARGETS
“Students who can identify what they are learning significantly outscore
those who cannot.” – Robert Marzano
The metaphor that Connie Moss and Susan Brookhart use to describe
learning targets in their Educational Leadership article, “What Students Need
to Learn,” is that of a global positioning system (GPS). Much like a GPS
communicates timely information about where you are, how far and how long
until your destination, and what to do when you make a wrong turn. A learning
target provides a precise description of the learning destination. They tell
students what they will learn, how deeply they will learn it, and how they will
demonstrate their learning.
Learning targets describe in student-friendly language the learning to
occur in the day’s lesson. Learning targets are written from the students’ point
of view and represent what both the teacher and the students are aiming for
during the lesson. Learning targets also include a performance of
understanding, or learning experience, that provides evidence to answer the
question “What do students understand and what are they able to do?”
As Moss and Brookhart write, while a learning target is for a daily
lesson, “Most complex understandings require teachers to scaffold student
understanding across a series of interrelated lessons.” In other words, each
learning target is a part of a longer, sequential plan that includes short and
long-term goals.
McMillan (2014) defined learning targets as a statement of student
performance for a relatively restricted type of learning outcome that will be
achieved in a single lesson or a few days, and contains what students should
know, understand and be able to do at the end of the instruction and criteria
for judging the level of demonstrated performance. It is more specific and
clear than the educational goals, standards, and learning objectives. To avoid
Teachers Observation
Teacher observation has been accepted readily in the past as a
legitimate source of information for recording and reporting student
demonstrations of learning outcomes. As the student progresses to later
years of schooling, less and less attention typically is given to teacher
observation and more and more attention typically is given to formal
assessment procedures involving required tests and tasks taken under explicit
constraints of context and time. However, teacher observation is capable of
providing substantial information on student demonstration of learning
outcomes at all levels of education.
For teacher observation to contribute to valid judgments concerning
student learning outcomes, evidence needs to be gathered and recorded
systematically. Systematic gathering and recording of evidence requires
preparation and foresight. Teacher observation can be characterised as two
types: incidental and planned.
Incidental observation occurs during the ongoing (deliberate) activities of
teaching and learning and the interactions between teacher and students.
In other words, an unplanned opportunity emerges, in the context of
classroom activities, where the teacher observes some aspect of
individual student learning. Whether incidental observation can be used
as a basis for formal assessment and reporting may depend on the
records that are kept.
Planned observation involves deliberate planning of an opportunity for the
teacher to observe specific learning outcomes. This planned opportunity
Student Self-Assessment
One form of formative assessment is self-assessment or self-reflection
by students. Self-reflection is the evaluation or judgment of the worth of one’s
performance and the identification of one’s strengths and weaknesses with a
view to improving one’s learning outcomes, or more succinctly, reflecting on
and monitoring one’s own work processes and/or products (Klenowski, 1995).
Student self-assessment has long been encouraged as an educational and
learning strategy in the classroom, and is both popular and positively
regarded by the general education community (Andrade, 2010).
Besides, McMillan and Hearn (2008) described self-assessment as a
process by which students 1) monitor and evaluate the quality of their thinking
and behavior when learning and 2) identify strategies that improve their
understanding and skills. That is, self-assessment occurs when students
judge their own work to improve performance as they identify discrepancies
between current and desired performance. This aspect of self-assessment
aligns closely with standards-based education, which provides clear targets
and criteria that can facilitate student self-assessment. The pervasiveness of
standards-based instruction provides an ideal context in which these clear-cut
benchmarks for performance and criteria for evaluating student products,
when internalized by students, provide the knowledge needed for self-
assessment. Finally, self-assessment identifies further learning targets and
instructional strategies (correctives) students can apply to improve
achievement.
Summary
In educational setting, the purpose of assessment may be classified in
terms of assessment of learning, assessment for learning, and
assessment as learning.
Assessment OF learning is held at the end of a subject or a course to
determine performance. It is equivalent to summative assessment.
Assessment FOR learning is done repeatedly during instruction to check
the learners’ progress and teacher’s strategies so that intervention or
changes can be made.
Assessment AS learning is done to develop the learners’ independence
and self-regulation.
Assessment
1. Describe the 3 purposes of classroom assessment by completing the
matrix below.
Assessment OF Assessment Assessment AS
learning FOR learning learning
WHAT?
WHY?
WHEN?
Sample
statements
Enrichment
Open the DepEd’s K to 12 Curriculum Guide from this link:
https://www.deped.gov.ph/k-to-12/about/k-to-12-basic-education-
curriculum/grade-1-to-10-subjects/. and make yourself familiar with the
content standards, performance standards and competency.
Choose a specific lesson for a subject area, and grade level that you want
to teach in the future. Prepare an assessment plan using the matrix.
Subject
Grade level
Performance standards
Specific lesson
Learning targets
Assessment
task/activity
improve your
instruction?
References
Andrade, H. (2010). Students as the definitive source of formative
assessment: Academic self-assessment and the self-regulation of
learning. In H. Andrade & G. Cizek (Eds.), Handbook of formative
assessment (pp. 90–105). New York, NY: Routledge.
Clayton, Heather. “Power Standards: Focusing on the Essential.” Making the
Standards Come Alive! Alexandria, VA: Just ASK Publications, 2016.
Access at www.justaskpublications.com/just-ask-resource-center/e-
newsletters/msca/power-standards/
David et al. (2020). Assessment in Learning 1. Manila: Rex Book Store.
De Guzman, E. and Adamos, J. (2015). Assessment of Learning 1. Quezon
City: Adriana Publishing Co., Inc.
EL Education (2020). Students Unpack a Learning Target and Discuss
Academic Vocabulary. [Video]. https://vimeo.com/44052219
Hattie, John. Visible Learning for Teachers: Maximizing Impact on Learning.
New York: Routledge, 2012.
Klenowski, V. (1995). Student self-evaluation processes in student-centred
teaching and learning contexts of Australia and England. Assessment
in Education: Principles, Policy & Practice, 2(2).
Maxwell, Graham S. (2001). Teacher Observation in Student Assessment.
(Discussion Paper). The University of Queensland.
Moss, Connie and Susan Brookhart. Learning Targets: Helping Students Aim
for Understanding in Today’s Lesson. Alexandria: ASCD, 2012.
Navarro, L., Santos, R. and Corpuz, B. (2017). Assessment of Learning 1 (3 rd
ed.). Quezon City: Lorimar Publishing, Inc.
Pre-discussion
Ask the students about their experiences when they took the National
Achievement Test (NAT) during their elementary and high school days. Who
administered it? How did you answer them? What do you think was the
purpose of the NAT? What about their experiences in taking quarterly tests or
quizzes? What other assessments or tests did they take before? What are
your notable experiences relative to taking tests?
What to Expect?
At the end of the lesson, the students can:
1. compare the following forms of assessment: educational vs.
psychological, teacher-made vs. standardized, selected-response vs.
constructed-response, achievement vs. aptitude, and power vs. speed;
2. give examples of each classification of test;
3. illustrate situations on the use of different classifications of
assessment; and
4. decide on the kind of assessment to be used.
Classifications of Assessment
The different forms of assessment are classified according to purpose, form,
interpretation of learning, function ability, and kind of learning.
Classification Type
Purpose Educational and Psychological
Form Paper and pencil, and Performance-based
Function Teacher-made and Standardized
Performance-based Assessment
The following six (6) types of activities provide good starting points for
assessments in performance-based learning.
1. Presentations
One easy way to have students complete a performance-based activity
is to have them do a presentation or report of some kind. This activity could
be done by students, which takes time, or in collaborative groups.
The basis for the presentation may be one of the following:
Providing information
Teaching a skill
Reporting progress
Persuading others
Students may choose to add in visual aids or a PowerPoint
presentation or Google Slides to help illustrate elements in their speech.
Presentations work well across the curriculum as long as there is a clear set
of expectations for students to work with from the beginning.
2. Portfolios
Student portfolios can include items that students have created and
collected over a period. Art portfolios are for students who want to apply to art
programs in college. Another example is when students create a portfolio of
their written work that shows how they have progressed from the beginning to
the end of class. The writing in a portfolio can be from any discipline or a
combination of disciplines.
Some teachers have students select those items they feel represents
their best work to be included in a portfolio. The benefit of an activity like this
is that it is something that grows over time and is therefore not just completed
and forgotten. A portfolio can provide students with a lasting selection of
artefacts that they can use later in their academic career.
Reflections may be included in student portfolios in which students may
make a note of their growth based on the materials in the portfolio.
3. Performances
Dramatic performances are one kind of collaborative activities that can
be used as a performance-based assessment. Students can create, perform,
and/or provide a critical response. Examples include dance, recital, dramatic
enactment. There may be prose or poetry interpretation.
This form of performance-based assessment can take time, so there
must be a clear pacing guide. Students must be provided time to address the
demands of the activity; resources must be readily available and meet all
safety standards. Students should have opportunities to draft stage work and
practice.
Developing the criteria and the rubric and sharing these with students
before evaluating a dramatic performance is critical.
4. Projects
Projects are commonly used by teachers as performance-based
activities. They can include everything from research papers to artistic
representations of information learned. Projects may require students to apply
their knowledge and skills while completing the assigned task. They can be
aligned with the higher levels of creativity, analysis, and synthesis.
Students might be asked to complete reports, diagrams, and maps.
Teachers can also choose to have students work individually or in groups.
6. Debates
A debate in the classroom is one form of performance-based learning
that teaches students about varied viewpoints and opinions. Skills associated
with debate include research, media and argument literacy, reading
comprehension, evidence evaluation, public speaking, and civic skills.
Standardized Test
A standardized test is a test that is given to students in a very
consistent manner. It means that the questions on the test are all the same,
the time given to each student is also the same, and the way in which the test
is scored is the same for all students. Standardized tests are constructed by
experts along with explicit instructions for administration, standard scoring
procedures, and a table of norms for interpretation.
Thus, a standardized test is administered and scored in a consistent or
"standard" manner. These tests are designed in such a way that the
questions, conditions for administering, scoring procedures, and
interpretations are consistent.
Any test in which the same test is given in the same manner to all test
takers, and graded in the same manner for everyone, is a standardized test.
Aptitude Test
Unlike achievement tests, which are concerned with looking a person's
level of skill or knowledge at any given time, aptitude tests are instead
focused on determining how capable of a person might be of performing a
certain task.
An aptitude test is designed to assess what a person is capable of
doing or to predict what a person is able to learn or do given the right
education and instruction. It represents a person's level of competency to
perform a certain type of task. Such aptitude tests are often used to assess
academic potential or career suitability and may be used to assess either
mental or physical talent in a variety of domains.
Some examples of aptitude tests include:
• A test assessing an individual's aptitude to become a fighter pilot
• A career test evaluating a person's capability to work as an air traffic
controller
other tests, the limits are short enough to make rate of work an important
factor in the score and these are called speed tests.
In the context of educational measurement, a power test usually refers
to a measurement tool composed of several items and applied without a
relevant time limit. The respondents have a very long time, or even unlimited
time, to solve each of the items, so they can usually attempt all of them. The
total score is often computed as the number of items correctly answered, and
individual differences in the scores are attributed to differences in the ability
under assessment, not to differences in basic cognitive abilities such as
processing speed or reaction time.
An example of a speed test is a typing test in which examinees are
required to type correctly as many words as possible given a limited amount
of time. An example of a power test was the one developed by the National
Council of Teachers in Mathematics that determine the ability of the
examinees to utilize data to reason and become creative, formulate, solve,
and reflect critically on the problems provided.
Summary
In this lesson, we did identify and distinguish from each other the different
classifications of assessment. We learned when to use educational and
psychological assessment, or paper-and-pencil and performance-based
assessment. Also, we were able to differentiate teacher-made and
standardized test, achievement and aptitude test, as well as, speed and
power tests.
Assessment
1. Which classification of assessment is commonly used in the classroom
setting? Why?
2. To demonstrate understanding, try giving more examples for each type of assessment.
Type Examples
Educational
Psychological
Paper and pencil
Performance-based
Teacher-made
Standardized
Achievement
Aptitude
Speed
Power
Norm-referenced
Criterion-referenced
3. Match the learning target with the appropriate assessment methods.
Check if the type of assessment is appropriate. Be ready to justify.
Learning targets Selected- Essay Performance Teacher Self-
response Task observation assessment
Example: Exhibit √ √ √
proper dribbling
of a basket ball
1. Identify parts
of a
microscope
and its
functions
2. Compare the
methods of
assessment
3. Arrange the
eating utensils
on table
4. Perform the
dance steps in
“Pandanggo
sa Ilaw”
5. Define
assessment
6. Compare and
contrast
testing and
grading
7. List down all
the Presidents
of the
Philippines
8. Find the
speed of a car
9. Recite the
mission of
SKSU
10. Prepare a
lesson plan in
Mathematics
Enrichment
Check the varied products of Center for Educational Measurement (CEM)
as regards standardized tests. Access it through this link:
https://www.cem-inc.org.ph/products
Try taking a free Personality Test available online. You can also try an IQ
test. Share the results with the class.
References
Aptitude Tests. Retrieved from https://www.aptitude-test.com/aptitude-
tests.html
Cherry, Kendra (2020, February 06). How Achievement Tests Measure What
People Have Learned. Retrieved from
https://www.verywellmind.com/what-is-an-achievement-test-2794805
Classroom Assessment. Retrieved from
https://fcit.usf.edu/assessment/selected/responseb.html
David et al. (2020). Assessment in Learning 1. Manila: Rex Book Store.
De Guzman, E. and Adamos, J. (2015). Assessment of Learning 1. Quezon
City: Adriana Publishing Co., Inc.
Improving your Test Questions. https://citl.illinois.edu/citl-101/measurement-
evaluation/exam-scoring/improving-your-test-questions?src=cte-
migration-map&url=%2Ftesting%2Fexam%2Ftest_ques.html
Navarro, L., Santos, R. and Corpuz, B. (2017). Assessment of Learning 1 (3 rd
ed.). Quezon City: Lorimar Publishing, Inc.
University of Lethbridge (2020). Creating Assessments. Retrieved from
https://www.uleth.ca/teachingcentre/exams-and-assignments
CHAPTER 3
DEVELOPMENT AND ENHANCEMENT OF TEST
Overview
This chapter deals on the process and mechanics in developing a
written test that is understandably a teacher-made type. As future professional
teachers, one has to be competent in the selection of the learning objectives
or outcomes, preparation of a table of specifications (TOS), the guidelines in
writing varied written test formats, and writing the test itself. Adequate
knowledge of the TOS construction is indispensable in formulating a valid test
in terms of content and construct. Also, the complete understanding of the
rules and guidelines in writing a specific test format would probably ensure an
acceptable and unambiguous test which is fair to the learners. In addition,
reliability and validity are 2 important characteristics of test that shall likewise
be included to guarantee quality. For test item enhancement, topics such as
difficulty index, index of discrimination and even distracter analysis are to be
introduced.
Objective
Upon completion of the unit, the students can demonstrate their
knowledge, understanding and skills in planning, developing and enhancing a
written test.
Pre-discussion
The setting of learning objectives for an assessment of a course or
subject are and the construction of a table of specifications for a classroom
test require specific skills and experience. To successfully perform these
foregoing tasks, a pre-service teacher should be able to distinguish the
different levels of cognitive behavior and identify the appropriate assessment
method for them. It is assumed that in this lesson, the competencies for
instruction that are cognitive in nature are the ones identified as the targets in
developing a written test, which should be reflected in the test’s table of
specifications to be created.
What to Expect?
At the end of the lesson, the students can:
1. define the necessary instructional outcomes to be included in a written
test;
2. describe what is a table of specifications (TOS) and its formats;
3. prepare a TOS for a written test; and
4. demonstrate the systematic steps in making a TOS.
instruction. They provide teachers the focus and direction on how the course
is to be handled, particularly in terms of course content, instruction, and
assessment. On the other hand, they provide the students with the reasons
and motivation to study and endure. They provide students the opportunities
to be aware of what they need to do to be successful in the course, take
control and ownership of their progress, and focus on what they should be
learning. Setting objectives for assessment is the process of establishing
direction to guide both the teacher in teaching and the student in learning.
For better understanding, Bloom has the following description for each
cognitive domain level:
Knowledge - Remember previously learned information
Comprehension - Demonstrate an understanding of the facts
Application - Apply knowledge to actual situations
Analysis - Break down objects or ideas into simpler parts and find
evidence to support generalizations
Synthesis - Compile component ideas into a new whole or propose
alternative solutions
Bloom’s Definitions
Remembering - Exhibit memory of previously learned material by recalling
facts, terms, basic concepts, and answers.
Understanding - Demonstrate understanding of facts and ideas by
organizing, comparing, translating, interpreting, giving descriptions, and
stating main ideas.
Applying - Solve problems to new situations by applying acquired
knowledge, facts, techniques and rules in a different way.
Analyzing - Examine and break information into parts by identifying
motives or causes. Make inferences and find evidence to support
generalizations.
Evaluating - Present and defend opinions by making judgments about
information, validity of ideas, or quality of work based on a set of criteria.
Creating - Compile information together in a different way by combining
elements in a new pattern or proposing alternative solutions
Table of Specifications
A table of specifications (TOS), sometimes called a test blueprint, is
a tool used by teachers to design a written test. It is a table that maps out the
test objectives, contents, or topics covered by the test; the levels of cognitive
behavior to be measured; the distribution of items, number, placement, and
weights of test items; and the test format. It helps ensure that the course’s
intended learning outcomes, assessments, and instruction are aligned.
Generally, the TOS is prepared before a test is created. However, it is
deal to prepare one even before the start of instruction. Teachers need to
create a TOS for every test that they intend to develop. The test TOS is
important because it does the following:
Ensures that the instructional objectives and what the test captures
match
Ensures that the test developer will not overlook details that are
considered essential to a good test
Makes developing a test easier and more efficient
Ensures that the test will sample all important content areas and
processes
Is useful in planning and organizing
Offers an opportunity for teachers and students to clarify achievement
expectations.
that can be best captured by a written test. There are objectives that are
not meant for a written test. For example, if you test the psychomotor
domain, it is better to do a performance-based assessment. There are also
cognitive objectives that are sometimes better assessed through
performance-based assessment. Those that require the demonstration or
creation of something tangible like projects would also be more
appropriately measured by performance-based assessment. For a written
test, you can consider cognitive, ranging from remembering to creating of
ideas that could be measured using common formats for testing, such as
multiple choice, alternative response test, matching type, and even essays
or open-ended tests.
2. Determine the coverage of the test. The next step in creating the TOS is
to determine the contents of the test. Only topics or contents that have
been discussed in class and are relevant should be included in the test
3. Calculate the weight for each topic. Once the test coverage is
determined, the weight of each topic covered in the test is determined. The
weight assigned per topic in the test is based on the relevance and the
time spent to cover each topic during instruction. The percentage of theme
for a topic in a test is determined by dividing the time spent for that topic
covered in the test. For example, for a test on the Theories of Personality
for General Psychology 101 class, the teacher spent ¼ to 1 ½ hours class
sessions. As such, the weight for each topic is as follows:
4. Determine the number of items for the whole test. To determine the
number of items to be included in the test, the amount of time needed to
answer the items are considered. As a general rule, students are given 30-
60 seconds for each item in test formats with choices. For one-hour class,
this means that the test should not exceed 60 items. However, because
you need also to give time for test paper/booklet distribution and giving
instructions, the number of items should be less, maybe just 50 items.
5. Determine the number of items per topic. To determine the number of
items to be included in the test, the weights per topic are considered. Thus, using
the examples above, for a 60-item final test, Theories & Concepts, Humanistic
Theories, Cognitive Theories, Behavioral Theories, and social Learning Theories
will have 5 items, Trait Theories – 10 items, and Psychoanalytic Theories – 15
items.
Topic Percent of Time No. of Items
(Weight)
Theory & Concepts 10.0 5
Psychoanalytic 30.0 15
Theories
Trait Theories 20.0 10
Humanistic Theories 10.0 5
Cognitive Theories 10.0 5
Behavioral Theories 10.0 5
Social Learning 10.0 5
Theories
Total 100 50 items
2. Two-Way TOS. A two-way TOS reflects not only the content, time spent,
and number of items but also the levels of cognitive behavior targeted per
test content based on the theory behind cognitive testing. For example, the
common framework for testing at present in the DepEd Classroom
Assessment Policy is the Revised Bloom’s Taxonomy (DepEd, 2015). One
advantage of this format is that it allows one to see the levels of cognitive
skills and dimensions of knowledge that are emphasized by the test. It also
shows the framework of assessment used in the development of the test.
Nonetheless, this format is more complex than the one-way format.
Content Time No. & KD* Level of Cognitive Behavior, Item Format, No.
Spent Percent and Placement of Items
of Items R U AP AN E C
Theories 0.5 5 (10.0%) F I.3
and Hours #1-3
Concepts C I.2
#4-5
Psycho- F I.2
analytic #6-7
Theories C I.2 I.2
#8-9 #10-11
P I.2 1.2
#12-13 #14-15
M 1.3 II.1 II.1
#16-18 #41 #42
Others
Scoring 1 point per 2 points per item 3 points per
item item
Overall 5 50 20 20 10
Total (100.0%)
Another presentation is shown below:
3. Three-Way TOS. This type of TOS reflects the features of one-way and
two-way TOS. One advantage of this format is that it challenges the test
writer to classify objectives based on the theory behind the assessment. It
also shows the variability of thinking skills targeted by the test. However, it
takes a much longer to develop this type of TOS.
Summary
Bloom's taxonomy is a set of three hierarchical models used to classify
learning objectives into levels of complexity and specificity. The three lists
cover the learning objectives in cognitive, affective and psychomotor
domains.
The cognitive domain list has been the primary focus of most traditional
education and is frequently used to structure curriculum learning
objectives, assessments and activities.
In the original version of the taxonomy, the cognitive domain is broken into
the following six levels of objectives, namely: knowledge, comprehension,
application, analysis, synthesis and evaluation.
In the 2001 revised edition of Bloom's taxonomy, the levels are slightly
different: Remember, Understand, Apply, Analyze, Evaluate, Create
(replacing Synthesize).
Knowledge involves recognizing or remembering facts, terms, basic
concepts, or answers without necessarily understanding what they mean.
Comprehension involves demonstrating an understanding of facts and
ideas by organizing, comparing, translating, interpreting, giving
descriptions, and stating the main ideas.
Application involves using acquired knowledge—solving problems in new
situations by applying acquired knowledge, facts, techniques and rules.
Learners should be able to use prior knowledge to solve problems, identify
connections and relationships and how they apply in new situations.
Analysis involves examining and breaking information into component
parts, determining how the parts relate to one another, identifying motives
or causes, making inferences, and finding evidence to support
generalizations.
Enrichment
1. Read the research article titled, “Classroom Test Construction: The Power
of a Table of Specifications” from
https://www.researchgate.net/publication/257822687_Classroom_Test_Co
nstruction_The_Power_of_a_Table_of_Specifications.
2. Watch the video titled, “How to use an automated Table of Specifications:
TOS Made Easy 2019.” Accessible from https://www.youtube.com/watch?
v=75W_N4UKP3A
3. Explore the post of Jessica Shabatura (September 27, 2013) on “Using
Bloom’s Taxonomy to Write Effective Learning Objectives.” Use this link
https://tips.uark.edu/using-blooms-taxonomy/.
4. Watch the video titled, “How to write learning objectives using Bloom’s
Taxonomy.” Accessible from https://www.youtube.com/watch?
v=nq0Ou1li_p0
Assessment
1. Answer the following questions:
1. When planning for a test, what should you do first?
2. Are all instructional objectives measured by a paper-pencil test?
3. When constructing a TOS where objectives are set without classifying
them according to their cognitive behavior, what format do you use?
4. If you designed a two-way TOS for your test, what does this format
have?
5. Why a teacher would consider a three-way TOS than the other
formats?
2. To be able check whether you have learned the important information
about planning the test, please provide your answer to the questions given
in the graphical representation.
2. Sample 2 in Science
Check (√) the competencies appropriate for the given test format or method
Competencies Appropriate Appropriate for Appropriate for
3. Sample 3 in Language
Check (√) the competencies appropriate for the given test format or method.
Competencies Appropriate Appropriate for Appropriate for
for Constructed Methods other
Objectives Type of Test than a Written
Test Format Format Test
1. Use words that describe
persons, places, animals,
and events
2. Draw conclusions based
on picture-stimuli/
passages
3. Write a different story
ending
4. Write a simple friendly
letter observing the
correct format
5. Compose riddles, slogans
and announcements from
the given stimuli
4. For the table of specifications, you can apply what you have learned by creating a
two-way TOS of the final exams of your class. Take into considerations the
content or topic, time spent for each topic; knowledge dimension; and item
format, number, and placement for each level of cognitive behavior. An example
of a TOS for a long exam for Abnormal Psychology class is shown below. Some
parts are missing. Complete the TOS based on the given information.
Content Time # of KD* Level of Cognitive Behavior, Item Format, No. and
Spent Items Placement of Items
R U AP AN E C
Disorder Usually 3 hours ? F I.10 I.10 I.10
First Diagnosed in #1-10 #? ?
Infancy, Childhood
or Adolescence
Cognitive Disorder 3 ? C I.10 I.10 I.10
? #? #?
Substance Related 1 10% P I.5 I.5
Disorder (10) #? #?
Schizophrenia and 3 ? M I.10 I.10 I.10
other Psychotic #? #? #?
Disorder
Total ? ? ? ? ? ? ?
10 100 45 25 30
Overall Total
10 100% 45% 25% 30%
5. Test Yourself
Choose the letter of the correct answer to every item given.
1. The instructional objective focuses on the development of learners’
knowledge. Can this objective be assessed using the multiple-choice
format?
A. No, this objective requires an essay format.
B. No, this objective is better assessed using matching type test.
C. Yes, as multiple-choice is appropriate is assessing knowledge.
D. Yes, as multiple-choice is the most valid format when assessing
learning.
2. You prepared an objective test format for your quarterly test in
Mathematics. Which of the following could NOT have been your test
objective?
A. Interpret a line graph
B. Construct a line graph
C. Compare the information presented in a line graph
D. Draw conclusions from the data presented in a line graph
3. Teacher Lanie prepared a TOS as her guide in developing a test. Why
is this necessary?
A. To guide the planning of instruction
B. To satisfy the requirements in developing a test
performance level you are at for (1) setting test objectives and (2) creating a table
of specifications.
Level Performance Benchmark Setting Test Creating Table of
Objectives Specifications
Proficient I know them very well. I can 4 4
teach others where and when
to use them appropriately.
Master I can do it by myself, though, I 3 3
sometimes make mistake.
Developing I am getting there, though I 2 2
still need help to be able to
perfect it.
Novice I cannot do it myself. I need 1 1
help to plan for my tests.
Based on your self-assessment above, choose the following tasks to help you enhance
your skills and competencies in setting course objectives and in designing a table of
specifications.
Level Possible Tasks
Proficient Help or mentor peer or classmates who are having difficulty in setting
test objectives and designing table of specifications.
Master Examine the areas that you need to improve on and address them
immediately. Benchmark with the test objectives and TOS developed
by your peers/classmates who are known to be proficient in this area.
“When I plan my test, I first design its TOS, so I know what I should
cover. I usually prepare a Two-way TOS. Actually, because I have
been teaching the same course for many years now, I have come to a
point that all my tests have their two-way TOS ready to be shown to
anybody, most specially my students. Hence, even at the start of term,
Educator’s Feedback
I know what I should teach and how they would be assessed. I know
In an interview
those topics thatwith a high schoolassessed
are appropriately teacher,through
this is what he shared
a written test. on his
practiceWeeks
when before the test
preparing is given, I usually give the TOS to my students,
a test.
so they have a guide in preparing for the test. I allot time in my class
for my students to examine the TOS of the test for them to check if
there were topics not actually taught in the class. My students usually
are surprised when I do this as they don’t normally see TOS of their
teacher’s test. But I do this as I want them to be successful. I find it fair
for them to know how much weight is given to every topic covered in
ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 86
the test. Most often, the outcome of the test is good as almost all, if not
all, of my students would pass my test.”
SULTAN KUDARAT STATE UNIVERSITY
References
Armstrong, P. (2020). Bloom’s Taxonomy. TN: Vanderbilt University Center
for Teaching. Retrieved from https://cft.vanderbilt.edu/guides-sub-
pages/blooms-taxonomy/
David et al. (2020). Assessment in Learning 1. Manila: Rex Book Store.
De Guzman, E. and Adamos, J. (2015). Assessment of Learning 1. Quezon
City: Adriana Publishing Co., Inc.
Fives, H. & DiDonato-Barnes, N. (February 2013). Classroom Test
Construction: The Power of a Table of Specifications. Practical
Assessment, Research & Evaluation, Volume 18 (3).
Isaacs, Geoff (1996). Bloom’s Taxonomy of Educational Objectives. The
University of Queensland: TEDI. Retrieved from
https://kaneb.nd.edu/assets/137952/bloom.pdf
Macayan, J. (2017). Implementing Outcome-Based Education (OBE)
Framework: Implications for Assessment of Students’ Performance.
Educational Measurement and Evaluation Review, Vol. 8 (1).
Magno, C. (2011). A Closer Look at other Taxonomies of Learning: A Guide
for Assessing Student Learning. The Assessment Handbook, Vol. 5.
Pre-discussion
The construction of good tests requires specific skills and experience.
To be able to successfully demonstrate your knowledge and skills in
constructing traditional types of tests that are most applicable to a particular
learning outcome, you should be able to distinguish the different test types
and formats, and understand the process and requirements in setting learning
objectives and outcomes and in preparing the table of specifications. For
proper guidance in this lesson, the performance tasks and success indicators
are presented below.
What to Expect?
At the end of the lesson, the students can:
1. describe the characteristics of selected-response and constructed-
response tests;
2. classify whether a test is selected-response or constructed-response;
3. identify the test format that is most appropriate to a particular learning
outcome/target;
4. apply the general guidelines in constructing test items;
5. prepare a written test based on the prepared TOS; and
6. evaluate a given teacher-made test based on guidelines.
such, it is important that assessment tasks or tests are meaningful and further
promote deep learning; as well as fulfill the criteria and principles of test
construction.
There are many ways by which learners can demonstrate their
knowledge and skills and show evidence of their proficiencies at the end of a
lesson, unit, or subject. While authentic or performance-based assessments
have been advocated as the better and more appropriate methods in
assessing learning outcomes, particularly as they assess higher-level thinking
skills (HOTS), the traditional written assessment methods, such as multiple-
choice tests, are also considered as appropriate and efficient classroom
assessment tools for some types of learning targets. This is mainly true for
large classes and when test results are needed immediately for some
educational decisions. Traditional tests are also deemed reliable and exhibit
excellent content and construct validity.
To learn or enhance your skills in developing good and effective test
items for a particular test format, you need to possess adequate knowledge
on different test formats; how and when to choose a particular test format that
is the most appropriate measure of the identified learning objectives and
desired learning outcomes of your subject; and how to construct good and
effective items for each format.
3. Is the test match or aligned with the course’s DLOs and the course
contents or learning activities?
they are limited when assessing learning outcomes that involved more
complex and higher-level thinking skills. Selected-response tests include:
Multiple Choice Test. It is the most commonly used format in formal
testing and typically consist of a stem (problem), one correct or best
alternative (correct answer), and 3 or more incorrect or inferior alternatives
(distractors).
True-False or Alternative Response Test. It generally consists of a
statement and deciding if the statement is true (accurate/correct) or false
(inaccurate/incorrect).
Matching Type Test. It consists of 2 sets of items to be matched with
each other based on a specified attribute.
Constructed-response tests require learners to supply answers to a
given question or problem. These include:
Short Answer Test. It consists of open-ended questions or incomplete
sentence that requires learners to create an answer for each item, which is
typically a single word or short phrase. This includes the following types:
Completion. It consists of incomplete statement that requires the
learners to fill in the blanks with the correct word or phrase.
Identification. It consists of statements that require the learners to
identify or recall the terms/concepts, people, places or events that
are being described.
Essay Test. It consists of problems/questions that require learners to
compose or construct written responses, usually long ones with several
paragraphs.
Problem-solving Test. It consists of problems/questions that require
learners to solve problems in quantitative or non-quantitative settings
knowing knowledge and skills in mathematical concepts and procedures,
and/or other higher-order cognitive skills (e.g., reasoning, analysis, critical
thinking and skills).
written test items could be confusing and frustrating to learners and yield test
scores that are not appropriate to evaluation their learning and achievement.
The following are the general guidelines in writing good multiple-choice items.
They are classified in terms of content, stem, and options.
A. Content
1. Write items that reflect only one specific content and cognitive processing
skill.
Faulty: Which of the following is a type of statistical procedure used to test
a hypothesis regarding significant relationship between variables,
particularly in terms of the extent and direction of association?
A. ANCOVA C. Correlation
B. ANOVA D. t-test
A. ANCOVA C. Chi-Square
B. ANOVA D. Mann-Whitney Test
2. Do not lift and use statements from the textbook or other learning materials
as test questions.
3. Keep the vocabulary simple and understandable based on level of
learners/examinees.
4. Edit and proofread the items for grammatical and spelling before
administering to the learners.
B. Stem
1. Write the directions in the stem in a clear and understandable manner.
Faulty: Read each question and indicate your answer by shading the circle
corresponding to your answer.
Good: This test consists of two parts. Part A is a reading comprehension
test, and Part B is grammar/language test. Each question is a
multiple-choice test item with five (5) options. You need to answer
each question but will not be penalized the wrong answer or for
guessing. You can go back and review your answer during the time
allotted.
2. Write stems that are consistent in form and structure, that is, present all
items either in question form or in description or declarative form.
Faulty: (1) Who was the Philippine president during Martial Law?
(2) The first president of the Commonwealth of the Philippines was
_______.
Good: (1) Who was the Philippine president during Martial Law?
(2) Who was the first president of the Commonwealth of the
Philippines?
3. Express the stem positively and avoid double negatives, such as NOT and
EXCEPT in a stem. If a negative word is necessary, underline or capitalize
the words for emphasis.
Faulty: Which of the following is not the measure of variability?
Good: Which of the following is NOT a measure of variability?
4. Refrain from making the stem too wordy or containing too much
information unless the problem or question requires the facts presented to
solve the problem.
Faulty: What does DNA stand for, and what is the organic chemical of
complex molecular structure found in all cells and viruses and codes
genetic information for the transmission of inherited traits?
Good: As a chemical compound, what does DNA stand for?
C. Options
1. Provide three (3) to five (5) options per item, with only one being the correct
or best answer/alternative.
2. Write options that are parallel or similar in form and length to avoid giving
clues about the correct answer.
Faulty: What is an ecosystem?
3. For each item, include only topics that are related with one another and
share the same foundation of information.
Faulty: Match the following:
A B
_____1. Indonesia A. Asia
_____2. Malaysia B. Bangkok
_____3. Philippines C. Jakarta
_____4. Thailand D. Kuala Lumpur
_____5. Year ASEAN was established E. Manila
F. 1967
Good: On the line to the left of each country in Column I, write the letter of the
country’s capital presented in column II.
Column I Column II
_____1. Indonesia A. Bandar Seri Begawan
_____2. Malaysia B. Bangkok
_____3. Philippines C. Jakarta
_____4. Thailand D. Kuala Lumpur
E. Manila
Item #1 is considered an unacceptable item because its response
options are not parallel and include different kinds of information that
can provide clues to the correct/wrong answers. On the other hand,
item #2 details the basis for matching and the response options only
include related concepts.
4. Make the response options short, homogeneous, and arranged in logical
order.
Faulty: Match the chemical elements with their characteristics.
A B
_____ Gold A. Au
_____ Hydrogen B. Magnetic metal used in steel
_____ Iron C. Hg
_____ Potassium D. K
_____ Sodium E. With lowest density
F. Na
Good: Match the chemical elements with their symbols.
A B
_____ Gold A. Au
_____ Hydrogen B. Fe
_____ Iron C. H
_____ Potassium D. Hg
_____ Sodium E. K
F. Na
In item #1, response options are not parallel in content and length.
They are not also arranged alphabetically.
5. Included response options that are reasonable and realistic and similar in
length and grammatical form.
Faulty: Match the subjects with their course description.
A B
___ History A. Studies the production and distribution of
goods/services
___ Political Science B. Study of politics and power
___ Psychology C. Study of society
___ Sociology D. Understand role of mental functions in social
behaviour
E. Uses narratives to examine and analyze past
events
A B
___ 1/4 A. 0.09
___ 5/4 B. 0.25
___ 7/25 C. 0.28
___ 9/10 D. 0.90
E. 1.25
Item #1 is considered inferior to item #2 because it includes the same
number of response options as that of the stimuli, thus making it more
prone to guessing.
can create items that minimize guessing and relevant clues to the correct
answer.
The following are the general guidelines in writing good fill-in-the-blank
or completion test items:
1. Omit only significant words from the statement.
Faulty: Every atom has a central _____ called a nucleus.
Good: Every atom has a central core called a (n) ______.
In item # 1, the word “core” is not the significant word. The item is also
prone to many and varied interpretations, resulting to many possible
answers.
2. Do not omit too many words from the statement such that the intended
meaning is lost.
Faulty: _______ is to Spain as the _______ is to United States and as
_______ is to Germany.
Good: Madrid is to Spain as the ______ is to France.
Item # 1 is prone to many and varied answers. For example, a student
may answer the question based on the capital of these countries or based
on what continent they are located. Item # 2 is preferred because it is
more specific and requires only one correct answer.
3. Avoid obvious clues to the correct response.
Faulty: Ferdinand Marcos declared martial law in 1972. Who was the
president during that period?
Good: The president during the martial law year was ___.
Item #1 already gives a clue that Ferdinand Marcos was the president
during this time because only the president of a country can declare
martial law.
4. Be sure that there is only one correct response.
Faulty: the government should start using renewable energy sources
for generating electricity, such as ____.
Good: the government should start using renewable sources of energy
by using turbines called ___.
Item #1 has many possible answers because the statement is very
general (e.g., wind, solar, biomass, geothermal, and hydroelectric). Item #
2 is more specific and only requires one correct answer (i.e., wind).
There are two types of essay test: (1) extended-response essay and
(2) restricted-response essay.
how much time they should allocate for each item, especially if several
essay questions are presented. How the responses are to be graded
or rated should also be clarified to guide the students on what to
include in their responses.
Example: What is the mean of the following score distribution: 32, 44. 56.
69, 75, 77, 95, 96?
A. 68 D. 74
B. 69 E. 76
C. 72
2. All possible answer choices - This type of question has four or five
options, and students are required to choose all of the options that are
correct.
Example: Consider the following score distribution: 12, 14, 14, 14, 17, 24,
27, 28, and 30. Which of the following is/are the correct measure/s of
central tendency? Indicate all possible answers.
A. Mean = 20 D. Median = 17
B. Mean = 22 E. Mode = 14
C. Median = 16
Options A, D, and E are all correct answers.
3. Type-in answer – This type of question does not provide options to
choose from. Instead, the learners are asked to supply the correct
answer. The teacher should inform the learners at the start how their
answer will be rated. For example, the teacher may require just the
correct answer or may require learners to present the step-by-step
procedures in coming up their answers. On the other hand, for non-
mathematical problem solving, such as a case study, the teacher may
present a rubric how their answer will be rated.
Example: Compute the mean of the following score distribution: 32, 44,
56, 69, 75, 77, 95, and 96. Indicate your answer in the blank
provided.
In this case, the learners will only need to give the correct answer
without having to show the procedures for computation.
Example: Lillian, a 55-year old accountant, has been suffering from
frequent dizziness, nausea, and light-headedness. During the
interview, Lillian was obviously restless, and sweating. She
reported feeling so stressed and fearful of anything without any
apparent reason. She could not sleep and eat well. She also
started to withdraw from family and friends, as she experienced
frequent panic attacks. She also said that she was constantly
worrying about everything in work and at home. What might be
Lillian’s problem? What should she do to alleviate all her
symptoms?
Problem-solving test items are good test format as they minimize
guessing, measure instructional objectives that focus in higher cognitive
levels, and measure extensive amount of contents or topics. However,
they require more time for teachers to construct, read, and correct, and
are prone to rater bias, especially when scoring rubrics/criteria are not
available. It is therefore important that good quality problem-solving test
items are constructed.
The following are some of the general guidelines in constructing
good problem-solving test items:
1. Identify and explain the problem clearly.
Faulty: Tricia was 135.6 lbs. when she started with her zumba
exercises. After three months of attending the sessions three
times a week, her weight was down to 122.8 lbs. About how
many lbs. did she lose after three months? Write your final
answer in the space provided and show your computations.
[This question asks “about how many” and does not indicate whether
learners need to give the exact weight or whether they need to round
off their answer and to what extent.]
Good: Tricia was 135.6 lbs. when she started with her zumba
exercises. After three months of attending the sessions three
times a week, her weight was down to 122.8 lbs. Did she lose
after three months? Write your final answer in the space
provided and show your computations. Write the exact weight;
do not round off.
2. Be specific and clear of the type of response required from the
students.
Faulty: ASEANA Bottlers, Inc. has been producing and selling Tutti
Fruity juice in Philippines, aside from their Singapore market.
The sales for the juice in the Singapore market were $5 million
Good: ASEANA Bottlers, Inc. has been producing and selling Tutti
Fruity juice in Philippines, aside from their Singapore market.
The sales for the juice in the Singapore market were S$5
million more than those of their Philippine market 2016, S$3
million more in 2017, and S$4.5 million in 2018. If the sales in
Mexican market in 2018 were PHP 35 million, what were the
sales in U.S. market during that year? Provide answer in
Singapore dollars (1S$ = PHP36.50). [This is a better item
because it specifies in what currency should the answer be
presented, and the exchange rate was given.]
3. Specify in the directions the bases for grading students’
answer/procedures.
Faulty: VCV Consultancy Firm was commissioned to conduct a survey
on the voters’ preferences in VIsayas and Mindanao for
upcoming presidential election. In Visayas, 65% are for Liberal
Party (LP) candidate, while 35% are for the Nationalists, while
30% are LP supporters. A survey was conducted among 200
voters for each region. What is the probability that the survey
will show a greater percentage of Liberal Party supporters in
Mindanao than in the Visayas region?
Party (LP) candidate, while 35% are for the Nationalist Party
(NP) candidate. In Mindanao, 70% of the voters are Nationalist
while 30% are LP supporters. A survey was conducted among
200 voters for each region.
Assessment
A. Let us review what you have learned about constructing traditional tests.
1. What factors should be considered when choosing a particular test
format?
2. What are the major categories and formats of traditional tests?
3. When are the following traditional tests appropriate to use?
Multiple-choice test - short-answer test
Matching-type test - essay test
True or false test - problem-solving tests
4. How should the items for the above traditional tests be constructed?
To check whether you have learned the important information about
constructing the traditional types of tests, please complete the following
graphical representation:
Effects of change of
demand and supply on
market price
Apply the concepts of Exchange Rate, Essay, problem sets,
demand and supply in Change in the Price of case analysis, and
actual cases Goods in the Market, exercises
Price Ceiling and Price
Floor
Others
B. Now that you are able to identify the types of assessment that you will
employ for each desired learning outcome for a subject, you are now
ready to construct sample tests for the subject. Construct a three-part test
that includes test formats of your choice. In the development of the test,
you will need the following information:
1. Desired learning outcomes for subject area.
2. Level of cognitive/thinking skills appropriate to assess the desired
learning outcomes
3. Appropriate test format to use
Yes No
1. Is the item completely true or completely false?
2. Is the item written in simple, easy-to-follow statements?
3. Are negatives avoided?
4. Are absolutes such as “always” and “never” used sparingly
or not at all?
5. Do items express only a single idea?
6. Is the use of unfamiliar vocabulary avoided?
7. Is the item or statement not lifted from the text, lecture, or
other materials?
D. Evaluate the level of your skills in developing different test formats using the
following scale:
Level Performance Multiple- Matching- True- Short- Essay
Benchmarking Choice Type False Answer
Proficient I know this 4 4 4 4 4
every well. I
can teach
others on how
to make one.
Master I can do it by 3 3 3 3 3
myself, though I
sometimes
make mistakes.
Developing I am getting 2 2 2 2 2
there, though I
still need help
to be able to
perfect it.
Novice I cannot do it 1 1 1 1 1
myself. I need
help to make a
good/effective
test
E. Based on your self-assessment, choose the following tasks to help you enhance
your skills and competencies in developing different test formats:
Level Possible Tasks
Proficient Help or mentor peer/classmates who are having
difficulty in developing good items for their course
assessment.
Master Examine the areas that you need to improve on and
address them immediately.
Developing/ Read more books/references on how to develop
Novice effective items.
Work and collaborate with your peer/classmates in
developing a particular test format.
Ask your teacher to evaluate the items that you have
developed and to give suggestions on how you can
improve you skills in constructing items.
F. Test your understanding about constructing test items for different test
formats. Answer the following items.
1. What are these statements that learners are expected to do or
demonstrate as a result of engaging in the learning process?
A. Desired learning outcomes C. Learning intents
B. Learning goals D. Learning objectives
2. Which of the following is NOT a factor to consider when choosing a
particular test format?
A. Desired learning outcomes of the lesson
B. Grade level of students
C. Learning activities
D. Level of thinking to be assessed
3. Ms. Daniel is planning to use a traditional/conventional type of
classroom assessment for her Trigonometry quarterly quiz. Which of
the following test formats she will likely NOT use?
Educators’ Feedback
“As a teacher in senior high school, I always make sure that my periodical
Ms.
exams Cudera teaches
measure Practical learning
the expected Research 1 and 2 in as
competencies a public senior
stipulated in high
the
curriculum
school. Whenguideasked
of theabout
Department of Education.
his experiences I then test
in writing create a table
items of
for his
specifications, wherein I follow the correct item allocation per competency based
subjects, he cited
on the number his practice
of hours of referring
being taught back
in the class andtothe
theappropriate
expected cognitive
learning
domain expected
outcomes of everyinlearning
as specified competency.
the DepEd I make
Curriculum sure and
Guide that using
in assessing
varied
students, I am always guided by the DepEd Order No. 8, s. 2015 also known as
types of assessments to measure his students’ achievement of these
the Policy Guidelines on Classroom Assessment for the K to 12 Basic Education
expected
Program. outcomes. This is what he shared:
For this school year, I was assigned to teach Practical Research 1 and 2
courses. To assess students’ learning or achievement, I first conducted
formative assessment to provide me some background on what students know
about Research. The result of the formative assessment allowed me to revise
my lesson plans and gave me some directions on how to proceed with and
handle the courses.
As part of the course requirements, I gave the students a lot of writing activities,
wherein they were required to write the drafts of each part of research. For each
work submitted, I read, checked, and gave comments and suggestions on how
to improve their drafts. I then allowed them to rewrite and revise their works. The
final research paper is used as basis for summative assessment.
Furthermore, I also relied heavily on essay tests and other performance tasks.
As I have mentioned. I required students to produce or write the different parts
of a research paper as outputs. They were also required to gather data for their
research. I utilized a rubric that was conceptualized collaboratively with my
students in order to evaluate their outputs. I used 360-degrees evaluation of
their output, wherein aside from my assessment, other members would assess
the work of others and leader would also evaluate the work of its members.
ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 118
I also conducted item analysis after every periodical exams to identify the least
mastered competencies for a given period, which to improve the performance of
the students.”
SULTAN KUDARAT STATE UNIVERSITY
References
Pre-discussion
By now, it is assumed that you have known how to plan a classroom
test by specifying the purpose for constructing it, the instructional outcomes to
be assessed, and preparing a test blueprint to guide the construction process.
The techniques and strategies for selecting and constructing different item
formats to match the intended instructional outcomes make up the second
phase of the test development process which is the content of the preceding
lesson. The process however is not complete without ensuring that the
classroom instrument is valid for the purpose for which it is intended. Ensuring
requires reviewing and improving the items which is the next stage in the
process. This lesson offers the pre-service teachers the practical and
necessary ways for improving teacher-developed assessment tools.
What to Expect?
At the end of the lesson, the students can:
1. list down the different ways for judgmental item-improvement and
other empirically-based procedures;
2. evaluate which type of test item-improvement is appropriate to use;
3. compute and interpret the results for index of difficulty, index of
discrimination and distracter efficiency; and
4. demonstrate knowledge on the procedures for improving a classroom-
based assessment.
Judgmental Item-Improvement
This approach basically makes use of human judgment in reviewing
the items. The judges are teachers themselves who know exactly what the
test for, the instructional outcomes to be assessed, and the items’ level of
difficulty appropriate to his/her class; the teacher’s peers or colleagues who
are familiar with the curriculum standards for the target grade level, the
subject matter content, and the ability of the learners; and the students
themselves who can perceive difficulties based on their past experiences.
Teachers’ Own Review
Peer review
There are schools that encourage peer or collegial review of
assessment instruments among themselves. Time is provided for this activity
and it has almost always yielded good results for improving tests and
performance-based assessment tasks. During these teacher dyad or triad
sessions, those teaching the same subject area can openly review together
the classroom tests and tasks they have devised against some consensual
criteria. The suggestions given by test experts can actually be used collegially
as basis for a review checklist:
a. Do the items follow the specific and general guidelines in writing
items especially on:
Being aligned to instructional objectives?
Making the problem clear and unambiguous?
Providing plausible options?
Avoiding unintentional clues?
Having only one correct answer?
b. Are the items free from inaccurate content?
c. Are the items free from obsolete content?
d. Are the test instructions clearly written for students to follow?
e. Is the level of difficulty of the test appropriate to level of learners?
f. Is the test fair to all kinds of students?
Student Review
Engagement of students in reviewing items has become a laudable
practice for improving classroom test. The judgment is based on the students’
experience in taking the test, their impressions and reactions during the
testing event. The process can be efficiently carried out through the use
review questionnaire. Popham (2011) illustrates a sample questionnaire
shown in the textbox below. It is better to conduct the review activity a day
after taking the test so the students still remember the experience when they
see a blank copy of the test.
Item-Improvement Questionnaire for Students
IfIfany
anyof
ofthe
theitems
itemsseemed
seemedconfusing,
confusing,which
whichones
oneswhere
wherethey?
they?
Did
Didany
anyitems
itemshave
havemore
morethan
thanone
onecorrect
correctanswer?
answer?IfIfso,
so,
which ones?
which ones?
Did
Didany
anyitems
itemshave
haveno
nocorrect
correctanswers?
answers?IfIfso,
so,which
whichones?
ones?
Were
Werethere
therewords
wordsin
inany
anyitem
itemthat
thatconfused
confusedyou?
you?IfIfso,
so,which
which
ones?
ones?
Were
Werethe
thedirections
directionsfor
forthe
thetest,
test,or
orfor
forparticular
particularsub-sections,
sub-sections,
unclear?
unclear?IfIfso,
so,which
whichones?
ones?
Another technique of eliciting student judgment for item improvement is
by going over the test with his/her students before the results are shown.
Students usually enjoy this activity since they can get feedback on the
answers they have written. As they tackle each item, they can be asked to
give their answer, and if there is more than one possible correct answer, the
teacher makes notations for item-alterations. Having more than one correct
answer signals ambiguity either in the stem or in the given options. The
teacher may also take the chance to observe sources of confusion especially
when answers vary. During this session, it is important for the teacher to
maintain an atmosphere that allows students to question and give
suggestions. It also follows that after an item review session, the teacher
should be willing to modify the incorrect keyed answers.
Empirically-based Procedures
Item-improvement using empirically-based methods is aimed at
improving the quality of an item using students’ response to the test. Test
developers refer to this technical process as item analysis as it utilizes data
obtained data separately for each item. An item is considered good when its
quality indices, i.e., difficulty index and discrimination index, meet certain
characteristics. For a norm-referenced test, these two indices are related
Difficulty Index
An item is difficult if majority of students are unable to provide the
correct answer. The item is easy if majority of the students are able to answer
correctly. An item can discriminate if the examinees who score high in the test
can answer more the items correctly than examinees who got low scores.
Below is a data set of five items on the additional and subtraction of
integers. Follow the procedure to determine the difficulty and discrimination of
each item.
1. Get the total score of each student and arrange scores from highest to
lowest.
Item 1 Item 2 Item 3 Item 4 Item 5
Student 1 0 0 1 1 1
Student 2 1 1 1 0 1
Student 3 0 0 0 1 1
Student 4 0 0 0 0 1
Student 5 0 1 1 1 1
Student 6 1 0 1 1 0
Student 7 0 0 1 1 0
Student 8 0 1 1 0 0
Student 9 1 0 1 1 1
Student 10 1 0 1 1 0
2. Obtained the upper and lower 27% of the group. Multiply 0.27 by the total
number of students, you will get a value of 2.7. The rounded whole
number value is 3.0. Get the top three students and the bottom 3 students
based on their scores. The top three students are students 2, 5, and 9.
The bottom three students are students 7, 8, and 4. The rest of the
students are not included in the item analysis.
3. Obtain the proportion of correct for each item. This is computed for the
upper 27% group and the lower 27% group. This is done by summating
the correct answer per item and dividing it by the total number of students.
Total 2 2 3 2 3
Proportion of the 0.67 0.67 1.00 0.67 1.00
high group (pH)
Student 7 0 0 1 1 0 2
Student 8 0 1 1 0 0 2
Student 4 0 0 0 0 1 1
Total 0 1 2 1 1
Proportion of the 0.00 0.33 0.67 0.33 0.33
low group (pL)
Item difficulty =
Computations
Item 1 Item 2 Item 3 Item 4 Item 5
Discrimination Index
Obviously, the power of an item to discriminate between informed and
uninformed groups or between more knowledgeable and less knowledgeable
learners are shown using the item-discrimination index (D). This is an item
statistics that can reveal useful information for improving an item. Basically,
an item discrimination index shows the relationship between the student’s
performance in an item (i.e., right or wrong) and his total performance in the
test represented by the total score. Item-total correlation is usually part of a
package from item analysis. Getting high item-total correlations indicate that
the items contribute well to the total score so that responding item-total
correlations indicate that the items contribute well to the total score so that
responding correctly to these items gives a better chance of obtaining
relatively high total scores in the whole test or subtest.
For classroom tests, the discrimination index shows if a difference
exists between the performance of those who scored high and those who
scored low in the item. As a general rule, the higher the discrimination index
(D), the more marked the magnitude of the difference is, and thus, the more
discriminating the item is. The nature of the difference however, can take
different directions.
a. Positively discriminating item – proportion of high scoring group is
greater than that of the low scoring group
b. Negatively discriminating item – proportion of high scoring group is
less than that of the low scoring group
c. Not discriminating item – proportion of high scoring group is equal
to that of the low scoring group
Computing the discrimination index therefore requires obtaining the
difference between the proportion of the high-scoring group getting the item
correctly and the proportion of the low-scoring group getting the item correctly
using this simple formula:
D = RU/TU – RL/TL
where D = is item discrimination index
RU = number of upper group getting the item correct
TU = number of upper group
RL = number of lower group getting the item correct
TL = number of lower group
Another calculation can bring about the same result as:
D = (RU – RL)/T
where RU = number of upper group getting the item correct
RL = number of lower group getting the item correct
T = number of either group
As you can see R/T is actually getting the p value of an item. So to get
D is to get the difference between the p-value involving the upper half and the
p-value involving the lower half. So the formula for discrimination index (D)
can also be given as (Popham, 2011):
D = pU – pL
where pU is the p-value for upper group (RU/TU)
pL is the p-value for lower group (RL/TL)
Distracter Analysis
Another empirical procedure to discover areas for item-improvement
utilizes an analysis of the distribution of responses across the distracters.
Obviously, when the difficulty index and discrimination index of the item seem
to suggest its being candidate for revision, distracter analysis becomes a
useful follow-up.
In distractor analysis, however, we are no longer interested in how test
takers select the correct answer, but how the distracters were able to function
effectively by drawing the test takers away from the correct answer. The
number of times each distractor is selected is noted in order to determine the
effectiveness of the distractor. We would expect that the distractor is selected
by enough candidates for it to be a viable distractor. What exactly is an
acceptable value? This depends to a large extent on the difficulty of the item
itself and what we consider to be an acceptable item difficulty value for test
times. If we are to assume that 0.7 is an appropriate item difficulty value, then
we should expect that the remaining 0.3 be about evenly distributed among
the distractors. Let us take the following test item as an example:
In
Inthe
thestory,
story, he
he was
wasunhappy
unhappybecause…………
because…………
A. it rained all day
A. it rained all day
B.
B. he
hewas
was scolded
scolded
C. he hurt himself
C. he hurt himself
D.
D.the
theweather
weatherwas
washot
hot
Let us assume that 100 students took the test. If we assume that A is
the answer and the item difficulty is 0.7, then 70 students answered correctly.
What about the remaining 30 students and the effectiveness of the three
distractors? If all 30 selected D, the distractors B and C are useless in their
role as distractors. Similarly, if 15 students selected D and another 15
selected B, then C is not an effective distractor and should be replaced. The
ideal situation would be for each of the three distractors to be selected by 10
students. Therefore, for an item which has an item difficulty of 0.7, the ideal
effectiveness of each distractor can be quantified as 10/100 or 0.1. What
would be the ideal value for distractors in a four option multiple choice item
when the item difficulty of the item is 0.4? Hint: You need to identify the
proportion of students who did not select the correct option.
From a different perspective, the item discrimination formula can also
be used in distractor analysis. The concept of upper groups and lower groups
would still remain, but the analysis and expectation would differ slightly from
the regular item discrimination that we have looked at earlier. Instead of
expecting a positive value, we should logically expect a negative value as
more students from the lower group should select distracters. Each distractor
can have its own item discrimination value in order to analyse how the
distracters work and ultimately refine the effectiveness of the test item itself. If
we use the above item as an example, the item discrimination concept can be
used to assess the effectiveness of each distractor. Consider a class of 100
students, then shall form the upper and lower groups of 30 students each.
Assume the following results are observed:
Summary
Enrichment
Read the following studies:
1. “Difficulty Index, Discrimination Index and Distractor Efficiency in
Multiple Choice Questions,” available from
https://www.researchgate.net/publication/323705126
2. “Item Discrimination and Distractor Analysis: A Technical Report on
Thirty Multiple Choice Core Mathematics Achievement Test Items,”
available from https://www.researchgate.net/publication/335892361
3. “Index and Distractor Efficiency in a Formative Examination in
Community Medicine,” available from
https://www.researchgate.net/publication/286478898
4. “Impact of distractors in item analysis of multiple choice questions.”
Available from : https://www.researchgate.net/publication/332050250
Assessment
A. Below are descriptions of procedures done to review and improve test
item. On the space provided, write J if a judgmental approach is uded and
E if empirically-based.
C. Below are additional data collected for the same items. Calculate another quality
index and indicate what needs to be done with the obtained index as a basis.
Item Upper Lower Index Revision needed to be done
Group Group
1 25 9
2 9 9
3 2 8
4 38 8
5 1 7
D. A distracter analysis is given for a test item given to a class of 60. Obtain the
necessary item statistics using the given data.
Item Difficult Discriminatio Group Alternatives
N=30 y index n index A B C D Omit
1 Upper
Lower
E. For each item, write the letter of your correct answer on the space
provided for.
1. Below are different ways of utilizing the concept of discrimination as an
index of item quality EXCEPT
a. Getting the proportion of those answering the item correctly over
those answering the items
b. Obtaining the difference between the proportion of high-scoring
group and the proportion of low-scoring group getting the item
correctly
c. Getting how much better the performance of the class by item is
after instruction than before
d. Differentiating the performance in an item of a group that has
received instruction and a group that has not
2. What can enable some students to answer items correctly even without
having enough knowledge for what is intended to be measured?
a. Clear and brief test instructions
b. Comprehensible statement of the item stem
c. Obviously correct and obviously wrong alternatives
d. Simple sentence structure of the problem
References
Conduct the Item Analysis. Retrieved from
http://www.proftesting.com/test_topics/steps_9.php
Pre-discussion
To be able to successfully perform the expected performance tasks,
students should have prepared a test following the proper procedure with
clear learning targets (objectives), table of specifications, and pre-test data
per item. In the previous lesson, guidelines were provided in constructing test
following different formats. They have also learned that assessment becomes
valid when the test items represent a good set of objectives, and this should
be found in table of specifications. The learning objectives or targets will help
them construct appropriate test items.
What to Expect?
At the end of this lesson, the students can:
Test Validity
A test is valid when it measures what it is supposed to measure.
Validity pertains to the connection between the purpose of the test and which
data the teacher chooses to quantify that purpose.
If a quarterly exam is valid, then the contents should directly measure
the objectives of the curriculum. If a scale that measure personality is
composed of five factors, then the scores on the five factors should have
items that are highly correlated. If an entrance exam is valid, it should predict
students’ grades after the first semester.
It is better to understand the definition through looking at examples of
invalidity. Colin Foster, an expert in mathematics education at the University
of Nottingham, gives the example of a reading test meant to measure literacy
that is given in a very small font size. A highly literate student with bad
eyesight may fail the test because they cannot physically read the passages
supplied. Thus, such a test would not be a valid measure of literacy (though it
may be a valid measure of eyesight). Such an example highlights the fact that
validity is wholly dependent on the purpose behind a test. More generally, in a
study plagued by weak validity, “it would be possible for someone to fail the
test situation rather than the intended test subject.”
There are cases for each type of validity provided that illustrates how it
is conducted. After reading the cases references about the different kinds of
validity look for a partner and answer the following questions. Discuss your
answer. You may use other references and browse the internet.
1. Content Validity
A coordinator in science is checking the science test paper for Grade 4.
She asked the Grade 4 science teacher to submit the table of specifications
containing the objectives of the lesson and the corresponding items. The
coordinator checked whether each item is aligned with the objectives.
How are the objectives used when creating test items?
How is content validity determined when given the objectives and the
items in a test?
What should be present in a test table of specifications when
determining content validity?
Who checks the content validity of items?
2. Face Validity
The assistant principal browsed the test paper made by the math
teacher. She checked if the contents of the items are about mathematics. She
examined if instructions are clear. She browsed through the items if the
grammar is correct and if the vocabulary is within the student’s level of
understanding.
What can be done in order to ensure that the assessment appears to be
effective?
What practices are done in conducting face validity?
Why is face validity the weakest form validity?
3. Predictive Validity
The school admission’s office developed an entrance examination. The
officials wanted to determine if the results of the entrance examination are
accurate in identifying good students. They took the grades of the students
accepted for the first quarter. They correlated the entrance exam results and
the first quarter grades. They found significant and positive correlations
between the entrance examination scores and grades. The entrance
examination results predicted the grades of students after the first quarter.
Thus, there was predictive-prediction validity.
Why are two measures needed in predictive validity?
What is the assumed connection between these two measures?
How can we determine if a measure has predictive validity?
What statistical analysis is done to determine predictive validity?
How can the test results of predictive validity be interpreted?
4. Concurrent Validity
A school Guidance Counsellor administered a math achievement test
to Grade 6 students. She also has a copy of the students’ grades in math.
She wanted to verify if the math grades of the students are measuring the
same competencies as the math achievement test. The school counsellor
correlated the math achievement scores and math grades to determine if they
are measuring the same competencies.
What needs to be available when conducting concurrent validity?
At least how many tests are needed for conducting concurrent validity?
What statistical analysis can be used to established concurrent validity?
How are the results of a correlation coefficient interpreted for concurrent
validity?
5. Construct Validity
A science test was made by a Grade 10 teacher composed of four
domains: matter, living things, force and motion, and earth space. There are
10 items under each domain. The teacher wanted to determine if the 10 items
made under each domain really belonged to that domain. The teacher
consulted an expert in test measurement. They conducted a procedure called
factor analysis. Factor analysis is a statistical procedure done to determine if
the items written will load under the domain they belong.
What type of test requires construct validity?
What should the test have in order to verify its constructs?
What are constructs and factors in a test?
How can these factors be verified if they are appropriate for the test?
What results come out in construct validity?
6. Convergent Validity
A Math teacher developed a test to be administered at the end of the
school year, which measures number sense, patterns and algebra,
measurement, geometry, and statistics. It is assumed by the math teacher
that students’ competencies in number sense improve their capacity to learn
patterns and algebra and other concepts. After administering the test, the
scores were separated for each area, and these five domains were inter-
correlated using Pearson r. the positive correlation between number sense
and patterns and algebra indicates that, when number sense scores increase,
the patters and algebra scores also increase. This shows student learning of
number sense scaffold patterns and algebra competencies.
What should a test have in order to conduct convergent validity?
What are done with the domains in a test on convergent validity?
What analysis is used to determine convergent validity?
How are the results in convergent validity interpreted?
7. Divergent Validity
An English teacher taught metacognitive awareness strategy to
comprehend a paragraph for Grade 11 students. She wanted to determine if
the performance of her students in reading comprehension would reflect well
in the reading comprehension test. She administered the same reading
comprehension test to another class which was not taught the metacognitive
awareness strategy. She compared the results using a t-test of independent
samples and found that the class that was taught metacognitive awareness
strategy performed significantly better that the other group. The test has
divergent validity.
What conditions are needed to conduct divergent validity?
What assumption is being proved in divergent validity?
What statistical analysis can be used to establish divergent validity?
How are the results of divergent validity interpreted?
Test Reliability
Reliability is not at all concerned with intent, instead asking whether the
test used to collect data produces accurate results. In this context, accuracy is
defined by consistency or as to whether the results could be replicated.
Also, it is the consistency of the responses to measure under three
conditions:
1. when retested on the same person;
2. when retested on the same measure; and
3. similarity of responses across items that measure the same
characteristic.
In the first condition, consistent response is expected when the test is
given to the same participants. In the second condition, reliability is attained if
the responses to the same test are consistent with the same characteristic
equivalent or another test that measures but measures the same
characteristic when administered at a different time. In the third condition,
there is reliability when the person responded in the same way or consistently
across items that measure the same characteristic.
There are different factors that affect the reliability of a measure. The
reliability of a measure can be high or low, depending on the following factor:
1. The number of items in a test – The more items a test has, the
likelihood of reliability is high. The probability of obtaining consistent
scores is high because of the large pool of items.
1. Liner regression
Linear regression is demonstrated when you have two variables that
are measured, such as two set of scores in a test taken at two different times
by the same participants. When the two scores are plotted in a graph (with X-
and Y-axis), they tend to form a straight line. The straight line formed the two
sets of scores can produce a linear regression. When a straight line is formed,
we can say that there is a correlation between the two sets scores. This can
be seen in the graph shown. This correlation is shown in the graph given. The
graph is called a scatterplot. Each point in the scatterplot is a respondent with
two scores (one for each test).
Formula:
where
∑X – Add all the X scores (Monday XY – Multiply the X and Y scores
scores) ∑X2 - Add all the squared values of X
∑Y – Add all the Y scores (Tuesday ∑Y2 – Add all the squared values of Y
scores) scores)
2
X – Square the value of the X scores ∑XY – Add all the production of X and Y
(Monday
2
Y – Square the value of the Y scores
(Tuesday scores)
0.80
Student Item Item Item Item Item Total for Score- (Score-Mean)2
1 2 3 4 5 each case (x) Mean
A 5 5 4 4 1 19 2.8 7.84
B 3 4 3 3 2 15 -1.2 1.44
C 2 5 3 3 3 16 -0.2 0.04
D 1 4 2 3 3 13 -3.2 10.24
E 3 3 4 4 4 18 1.8 3.24
Total for 14 21 16 17 13 Xcase=16.2 ∑(Score-
each item Mean)2= 22.8
(∑X)
Mean 2.8 4.2 3.2 3.4 2.6 =
5.7
SD2 2.2 0.7 0.7 0.3 1.3
∑ =5.2
The Cronbach’s alpha formula is given by:
Hence,
/ The scores given by the three raters are first computed by summing up
the total rating for each demonstration. The mean is obtained for the sum of
ratings (XRatings=8.4). The mean is subtracted from each of the Sum of Ratings
(D). Each difference is squared (D 2), then the sum of squares is computed
(∑D2=33.2). The mean and summation of squared different is substituted in
the Kendall’s W formula. In the formula, m is the numbers of raters while k is
the number of students who perform the demonstrations.
Let us consider the formula and the substitution of values:
Summary
A test is valid when it measures what it is supposed to measure. It can be
categorized as face, content, construct, predictive, concurrent, convergent,
or divergent validity.
Reliability is the consistency of the responses to measure. It can be
implemented through test-retest, parallel forms, split-half, internal
consistency and inter-rater reliability.
Enrichment
A. Get a journal article about a study that developed a measure or conducted
validity or reliability tests. You may also download from any of the following
open source.
Google Scholar
Directory of open access journals
Multidisciplinary open access journals
Allied academics journals
Your task is to write a short report focusing on important information on
how the authors conducted and established test validity and reliability.
Provide the following information.
1. Purpose of the study
2. Describe the instrument with its underlying factors
3. Validity technique used in the study and analysis they used
4. Reliability techniques used in the study and analysis used
5. Results of the tests validity and reliability
B. Learn more on Reliability and Validity in Student Assessment by watching
a clip from http://www.youtube.com/watch?v=gzv8Cm1jC4M.
Assessment
A. Indicate the type of reliability applicable for each case. Write the type
of reliability on the space before the number.
Reliability Cases
Type
1. Mr. Perez conducted a survey of his students to determine
their study habits. Each item is answered using a five-point
scale (always, often, sometimes, rarely, never). He wanted
to determine if the responses for each item are consistent.
What reliable technique is recommended?
2. A teacher administered a spelling test to her students. After
a day, another spelling test was given with the same length
and stress of words. What reliability can be used for the
two spelling tests?
3. A PE teacher requested two judges to rate the dance
performance of her students in physical education. What
reliability can be used to determine the reliability of the
judgements?
4. An English teacher administered a test to determine
students’ use of verb given a subject with 20 items. The
scores were divided into items 1 to 10, and another for
items 11 to 20. The teacher correlated the two set of
scores that form the same test. What reliability is done
here?
5. A computer teacher gave a set of typing tests in
Wednesday and gave the same set of the following week.
The teacher wanted to know if the students’ typing skills
are consistent. What reliability can be used?
B. Indicate the type of validity applicable for each case. Write the type of validity on
the blank before the number.
1. The science coordinator developed a science test to determine
who among the students will be placed in an advanced science
section. The students who scored high in the science test were
selected. After two quarters, the grades of the students in the
advanced science were determined. The scores in the science
test were correlated with the science grades to check if the
science test was accurate in the selection of students. What type
of validity was used?
Your task is to determine whether the spelling test is reliable and valid
using the data to determine the following: (1) split-half, (2) Cronbach’s
alpha, (3) predictive validity with the English grade, (4) convergent validity
of between words with single and two stresses, and (5) difficulty index of
each item.
Student Item Item Item Item Item Item Item Item Item Item English
No. 1 2 3 4 5 6 7 8 9 10 grades
1 1 0 0 1 1 1 0 1 1 0 80
2 0 0 0 1 1 1 1 1 0 0 81
3 1 1 0 0 1 0 1 0 1 1 83
4 0 1 0 0 1 1 1 1 1 0 85
5 0 1 1 0 1 1 1 0 1 1 84
6 1 0 1 0 1 1 1 1 1 1 89
7 1 0 1 1 1 1 1 1 0 1 87
8 1 1 1 0 1 1 1 1 1 1 87
9 1 1 1 1 1 1 1 1 0 1 89
10 1 1 1 1 0 0 1 1 1 1 90
11 0 1 1 1 0 1 1 1 1 0 90
12 1 0 1 1 1 1 1 1 1 1 87
13 1 1 1 1 1 1 1 0 1 1 88
14 1 1 0 1 1 1 1 1 1 1 88
15 1 1 1 1 1 0 1 1 0 1 85
D. Create a short test and report its validity and reliability. Select a grade
level and subject. Choose one or two learning competencies and make at
least 10-20 items for these two learning competencies. Consult your
teacher on the items and the table of specification.
1. Have your items checked by experts if they are aligned with the
selected competencies.
2. Revise your items based on the reviews provided by the experts.
3. Make a layout of you test and administer to about 100 students.
4. Encode you data and you may use an application to compute for the
needed statistical analysis.
5. Determine the following:
Split-half reliability
Cronbach’s alpha
Item difficulty and discrimination
Write a report on you procedure. The report will contain the following parts:
Introduction. Give the purpose of the study. Describe the test
measures, its component, the competencies selected, and kind of items.
Rationalize the need to determine the validity and reliability of the test.
Method. Describe the participants who took the test. Describe what the
test measures, number of items, test format, and how content validity was
established. Describe the procedure on how data was collected or how the
test was administered. Describe what statistical analysis was used.
Results. Present the results in a table and provide the necessary
interpretations. Make sure to show the results of the split-half reliability,
Cronbach’s alpha, construct validity of the items with the underlying factors,
convergent validity of the domains, and item difficulty and discrimination.
Discussion. Provide implications about the test validity and reliability.
E. Multiple Choice
Choose the letter of the correct and best answer in every item.
1. Which is a way in establishing test reliability?
A. The test is examined if free from errors and properly administered.
B. Scores in a test with different versions are correlated to test if they
are parallel.
C. The components or factors of the test contain items that are
strongly uncorrelated.
D. Two or more measures are correlated to show the same
characteristics of the examinee.
2. What is being established if items in the test are consistently answered
by the students?
A. Internal consistency C. test-retest
B. Inter-rater reliability D. split-half
3. Which type of validity was established if the components or factors of a
test are hypothesized to have a negative correlation?
A. Construct validity C. Content validity
B. Predictive validity D. Divergent validity
4. How do we determine of an item is easy or difficult?
A. An item is easy if majority of students are not able to provide the
correct answer. The item is easy if majority of the students are able
to answer correctly.
B. An item is difficult if majority of students are not able to provide the
correct answer. The item is difficult if majority of the students are
able to answer correctly.
C. An item can be determine difficult if the examinees who are high in
the test can answer more the items correctly than the examinees
who got low scores. If not, the item is easy.
D. An item can be determine easy if the examinees who are high in the
test can answer more the items correctly than the examinees who
got low scores. If not, the item is difficult.
5. Which is used when the scores of the two variables measured by a test
taken at two different times by the same participants are correlated?
A. Pearson r correlation C. Significance of the correlation
B. Linear regression D. positive and negative correlation
References
David et al. (2020). Assessment in Learning 1. Manila: Rex Book Store.
De Guzman, E. and Adamos, J. (2015). Assessment of Learning 1. Quezon
City: Adriana Publishing Co., Inc.
Exploring Reliability in Academic Achievement. Retrieved from
https://chfasoa.uni.edu/reliabilityandvalidity.htm
Exploring Reliability in Academic Achievement. Retrieved from
https://chfasoa.uni.edu/reliabilityandvalidity.htm
Price et al. (2017). Reliability and Validity of Measurement. In Research
Method in Psychology (3rd ed.). California, USA: The Saylor
Foundation. Retrieved from
https://opentext.wsu.edu/carriecuttler/chapter/reliability-and-validity-of-
measurement/
Professional Testing, Inc. (2020). Building High Quality Examination
Programs. Retrieved from
http://www.proftesting.com/test_topics/steps_9.php
The Graide Network, Inc. (2019). Importance of Validity and Reliability in
Classroom Assessments. Retrieved from
https://www.thegraidenetwork.com/blog-all/2018/8/1/the-two-keys-to-
quality-testing-reliability-and-validity
CHAPTER 4
ORGANIZATION, UTILIZATION, AND COMMUNICATION OF TEST
RESULTS
Overview
As we have learned in previous lessons, tests as used to measure
learning or achievement are form of assessment. They are undertaken to
gather data about student learning. These test results can assist teachers and
the school in making informed decisions to improve curriculum and
instruction. Thus, collected information such as test scores should have to be
organized to appreciate its meaning. Usually, the use of charts and tables are
ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 158
SULTAN KUDARAT STATE UNIVERSITY
Objective
Upon completion of the chapter, the students can demonstrate their
knowledge, understanding and skills in organizing, presenting, utilizing and
communicating the test results.
Pre-discussion
What to Expect?
At the end of the lesson, the students can:
1. organize the raw data from a test;
2. construct a frequency distribution;
3. acquire knowledge on the basic rules in preparing tables and graphs;
4. Summarize test data using appropriate table or graph;
5. use Microsoft Excel to construct appropriate graphs for a data set;
6. interpret the graph of a frequency and cumulative frequency
distribution; and
7. characterize a frequency distribution graph in terms of skewness and
kurtosis.
Frequency Distribution
Somewhat agree 15
Not sure 20
Somewhat disagree 20
Strongly disagree 15
Total 100
Step 2:
Second step is to decide the number and size of the groupings to be
used. In this process, the first step is to decide the size of the class interval.
According to H.E. Garrett (1985:4), the most “commonly used grouping
intervals are 3, 5, 10 units in length.” The size should be such that number of
Step 3:
Prepare the class intervals. It is natural to start the intervals with their
lowest scores at multiples of the size of the intervals. For example, when the
interval is 3, it has to start with 9, 12, 15, 18, etc. Also, when the interval is 5,
it can start with 5, 10, 15, 20, etc.
The class intervals can be expressed in three different ways:
First Type:
The first types of class intervals include all scores.
For example:
10 - 15 includes scores of 10, 11, 12, 13 and 14 but not 15
15 - 20 includes scores of 15, 16, 17, 18 and 19 but not 20
20 - 25 includes scores of 20, 21, 22, 23 and 24 but not 25
In this type of classification, the lower limit and higher limit of the each
class is repeated.
This repetition can be avoided in the following type.
Second Type:
In this type the class intervals are arranged in the following way:
10 - 14 includes scores of 10, 11, 12, 13 and 14
15 - 19 includes scores of 15, 16, 17, 18 and 19
20 - 24 includes scores of 20, 21, 22, 23 and 24
Here, there is no question of confusion about the scores in the higher
and lower limits as the scores are not repeated.
Third Type:
Sometimes, we are confused about the exact limits of class intervals
because very often it is necessary the computations to work with exact limits.
A score of 10 actually includes from 9.5 to 10.5 and 11 from 10.5 to 11.5.
Thus, the interval 10 to 14 actually contains scores from 9.5 to 14.5. The
same principle holds no matter what the size of interval or where it begins in
terms of a given score. In the third type of classification we use the real lower
and upper limits.
9.5 - 14.5
14.5 - 19.5
19.5 - 24.5 and so on.
Step 4:
Once we have adopted a set of class intervals, we need to list them in
their respective class intervals. Then, we have to put tallies in their proper
intervals. (See illustration in Table 1.)
Step 5:
Make a column to the right of the tallies headed “f” (frequency). Write
the total number of tallies on each class interval under column f. The sum of
the f column will be total number of cases “N”.
The next matrix contains the scores of students in mathematics.
Tabulate the scores into frequency distribution using a class interval of 5
units.
Solution:
45 - 49 3 47 56 7 93% 12%
40 - 44 2 42 58 4 97% 7%
35 - 39 2 37 60 2 100% 3%
7. Graphic form of data is also very useful device to suggest the direction
of investigations. Investigations cannot be conducted without any
regard to the desired aim and the graphic form helps in fulfilling that
desired aim by suggesting the direction of investigations.
8. In short, graphic form of statistical data converts the complex and huge
data into a readily intelligible form and introduces an element of
simplicity in it.
The graph's vertical axis should always start with zero. A usual type of
distortion is starting this axis with values higher than zero. Whenever it
happens, differences between variables are overestimated, as can been seen
in Figure 1.
3. Bar Graph
This graph is often used to present frequencies in categories of a
qualitative variable. It looks very similar to a histogram, constructed in the
same manner, but spaces are placed in between the consecutive bars.
The columns represent the categories and the height of each bar as in a
histogram represents the frequency. If experimental data are graphed, the
independent variables in categories is usually plotted on the x-axis, while
variable in the horizontal or x-axis is categorical, bar graphs can be
presented horizontally. Bar graphs are very useful in comparison of test
performance of groups categorized in two or more variables. Following are
some examples of bar graphs.
Selection of the most appropriate graph for a given set of data can be
facilitated by some computer software or applications. A common application
is the Chart Wizard in Microsoft Excel which offers an array of different charts
along with several variants.
are empirical data that illustrate situations in the real world. With the world
population reaching 7.6 billion, you can imagine hundreds of possible
frequency distributions representing different groups and subgroups taken
from an infinitely large population. It is reasonable to expect that there will be
variations in the shapes of frequency distributions. Researchers, scientists,
and educators have found that empirical data, when recorded, fit the following
shapes of frequency distributions.
What is skewness?
Examine the graphs below.
The graphs of Figures 10c and 10d are asymmetrical in shape. The
degree of asymmetry of a graph is called skewness. Basic principles of a
coordinate system tell us that, as we move toward the right of the x-axis, the
numerical value increases. Likewise, as we move up y-axis, the scale value
becomes higher. Thus, in a negatively-skewed distribution, there are more
who get higher scores and the tail indicates that the lower frequencies of
distribution points to the left or to the lower scores. On the other hand, in
positively-skewed distribution, lower scores clustered on the left side. This
means that there are more who get lower scores and the tail indicates the
lower frequencies are on the right or to the higher scores.
The graph in Figure 10b is a rectangular distribution. It occurs when the
frequency of each score class interval of scores is the same or almost
comparable such that it is also called a uniform distribution.
We have differentiated the four graphs in terms of skewness, which
refers to their symmetry or asymmetry (non-symmetry). Another way of
characterizing frequency distribution is with respect the number of “peaks”
seen on the curve. Refer to the following graphs.
What is kurtosis?
Another way of contrasting frequency distributions is illustrated below.
Let us consider the graphs of three frequency distributions in Figure 13.
Summary
Test data are better appreciated and communicated if they are arranged,
organized, and presented in a clear and concise manner.
A frequency distribution is a list, table or graph that displays the frequency
of various outcomes in a sample. Each entry in the table contains the
frequency or count of the occurrences of values within a particular group
or interval.
There are steps to follow in constructing a frequency distribution.
Tables and graphs are common tools that help readers better understand
the test results.
The graphic method is mainly used to give a simple, permanent idea and
to emphasize the relative aspect of data.
Tabulation of statistical data is primarily needed over the graphic
presentation.
Data are plotted on a graph from a table. This means that graphic form
cannot replace tabular form of data but can definitely supplement it.
Skewness is a measure of symmetry, or more precisely, the lack of
symmetry. A distribution, or data set, is symmetric if it looks the same to
the left and right of the center point.
Kurtosis is a measure of whether the data are heavy-tailed or light-tailed
relative to a normal distribution. Data sets with high kurtosis tend to have
heavy tails, or outliers, while data sets with low kurtosis tend to have light
tails, or lack of outliers.
Enrichment
1. Explore the Chart Wizard facility of Microsoft Excel application.
2. Read the following articles:
a. “How to Create a Chart in Excel using the Chart Wizard” from
https://www.middlesex.mass.edu/KB/Articles/Public/127/.
b. “Are the Skewness and Kurtosis Useful Statistics?” from
https://www.spcforexcel.com/knowledge/basic-statistics/are-
skewness-and-kurtosis-useful-statistics
3. Watch the following videos:
a. “MS Excel - Pie, Bar, Column & Line Chart” by Tutorials Point (India)
Ltd. (2018, January 15) from https://www.youtube.com/watch?
v=Z2gzLYaQatQ.
b. “How to Construct a Frequency Distribution Table” from
https://www.youtube.com/watch?v=j6ftiC2o6O4.
Assessment
A. Let us see how well you understood what have been presented in this
lesson.
1. Consider the table showing the results of a reading examination of set of students.
Class Midpoint F Cumulative Cumulative
Interval Frequency Percentage
140-144 142 2
135-139 137 7
130-134 132 9
125-129 127 14
120-124 122 10
115-119 117 6
110-114 112 2 2
g. The entry in the lowest class interval of the 4th column is done for
you. From the lower class interval, can you fill up the remaining
blanks upward? How did you do it?
h. Look at the entire column on cumulative frequency. What is the
cumulative frequency of the highest class interval? How do you
compare this cumulative frequency with the number of students
who took the test?
i. The last column is labeled cumulative percentage. What should be
the first entry at the bottom of the column? How did you determine
it? Can you fill up the entire column with the right percentage? How
do you do these in two ways? Which is the easy way?
j. Take a look at the values in the table, in particular, the frequency
column. What type of distribution (positively skewed, negatively
skewed, and symmetrical) is depicted by the given values? Why do
you say so?
k. What type of graph is most appropriate for this frequency table?
2. Analyze the figures in the succeeding pages and answer the questions
that pertain to each graph.
For Figure 15:
a. What is the shape of the frequency distributions as to symmetry?
b. What is the estimated value of the highest score in each
distribution? What does this value indicated?
c. Which section got the highest average? Which section got the
lowest?
3. Now, to further see how well you were able to comprehend all the topics
discussed earlier, fill in the answer to each box in the diagram below.
B. Accomplish the following activities to know the extent to which you have
understood the concepts introduced in this lesson.
1. The following aptitude test scores have been recorded in a guidance office.
140 88 115 91 96
93 117 99 101 108
98 123 119 146 107
107 111 100 125 110
83 127 116 113 104
126 114 110 114 138
109 102 113 106 90
107 91 102 103 135
104 101 131 87 124
113 135 126 112 140
5. Cumulative percentage.
f. Construct a histogram from the given scores.
g. Draw a frequency polygon superimposed in the histogram you have
done in f.
h. Using your data in e.5, draw a cumulative percentage polygon.
Figure 19 shows a graph constructed from a first quarter exam in Science
gathered from 193 STEM students; 100 are males and 93 females. Give three
statements on test performance of STEM students as depicted in the figure.
Self-Confidence Inventory
Put check mark (√) on the appropriate column that describe how you find yourself in
the following situations below. There is no right or wrong response on each item, so
feel free to express your true self. Results will be kept strictly confidential.
Statements Always Almost Sometimes Seldom Never
(5) Always (4) (3) (2) (1)
1. I feel that I have a
number of positive
qualities.
2. I feel I am a worthy
person to my family,
friends, and classmates.
3. I am inclined to think I am
a failure.
4. I have many
accomplishments as what
others of my age have
done.
5. I feel I do not have much
to be proud of my family.
6. I am happy with who I
am.
7. I feel I have not
contributed much as a
son/daughter to my
parents.
8. I feel that my classmates
are afraid to approach me
for help.
9. I am afraid to make
mistakes.
10. I am not bothered
about what people say
about me.
11. With how I am going,
the future will be bright for
me.
12. I get excited when I
try new things.
13. I cannot sleep when I
hear negative things
about me.
14. I am as important as
other people.
15. I feel depressed
when I do not succeed in
what I plan to achieve.
4. What period shows the highest increase of students passing the subjects?
A. 1st Quarter
B. 2nd Quarter
C. 3rd Quarter
D. 4th Quarter
5. What is the rate of increase of passing from the 2nd to 3rd quarter?
A. 75%
B. 50%
C. 33%
D. 25%
F. Supplemental Exercises
1. The following is a frequency distribution of examination marks:
Class interval f
90 – 94 6
85 – 89 9
80 – 84 7
75 – 79 13
70 – 74 14
65 – 69 19
60 – 64 11
55 – 59 11
50 – 54 9
45 – 49 8
40 – 44 8
Answer the following questions. You are free to consult your teacher
should you have concerns over these exercises.
a. What is the size of the class interval?
Educators’ inputs
I have been teaching statistics for many years at both undergraduate
and graduate programs in the College of Education. I am happy with the
illustration of statistics in the area of assessment and evaluation. It gave me
the opportunity to teach statistics in different contexts. It provided me with a
practical application of statistics in the assessment of students’ learning.
Topics in statistics like data handling can be boring to many. When I present,
for example, test score data with 100 observations, my students usually just
look at the data and wait for my next step. When I group data into a frequency
distribution table, I see my students amazed with the summarized and
condensed form of information. When I create a graph out of the frequency
table for a group data, I see students becoming more attentive with pictures
and graphs. However, drawing graphs in the traditional way with different
steps to follow and the do’s and don’ts could cause anciently for many. When
I create graphs, I usually do it through SPSS software, which has been an
indispensable tool in my statistics class. When I use this software vis-à-vis the
traditional way of computing, students are amazed. While I should be happy
when my students show interest in what they learn in my class, I am also
concerned that there is an erosion in my students’ ability to read hidden
information in a graph, which a teacher should be concerned about, being
mindful of this, I have some favorite lines that I use when teaching the use of
tables and graphs in organizing and presenting test data:
It seems you prefer seeing pictures rather than tables and reading text.
But let me give you some words of caution. When you read materials with
pictures or graphs in published works on competitive achievement tests, or
other forms of advertisement and reports, be critical. A picture or a graph may
be deceiving. With some tricks, you can be misled. With some mechanical
manipulations, like compressing or expanding graphs with incorrect scaling,
you can be deceived. Do not rely completely on the visuals, examine the
underlying information and detect the missing information.
References
Pre-discussion
Discussions in this lesson will build upon the concepts and examples
presented in the preceding lesson, which focused on the tabular and graphical
presentation and interpretation of test results. This time, other ways of
summarizing test data using descriptive statistics, which provides a more
precise means of describing a set of scores, will be introduced. The word
“measures” is commonly associated with numerical and quantitative data.
Hence, the prerequisite to understanding the concepts contained in this
lesson in your basic knowledge of mathematics, e.g., summation of values,
simple operation on integers, squaring and finding the square roots, etc.
What to Expect?
At the end of the lesson, the students can:
1. find the mean, median, and mode of test score distribution;
2. determine the different measures of dispersion of test scores;
3. calculate the measure of position;
4. relate standard deviation and normal distribution;
5. transform raw scores to standardized scores (z, T and stanine);
6. compute the measure of covariability using the long process and
Excel; and
7. interpret test data applying measures of central tendency, variability,
position, and covariability.
Mean. This is the most preferred measure of central tendency for use
with test scores, also referred to as the arithmetic mean. The computation is
very simple. When a student has added up the examination scores he/she
made in a subject during the grading period and divided it by the number of
examinations taken, then he/she computed the arithmetic mean.
That is, = where = the mean, the sum of all the scores,
The mean is the sum of all the scores from 53 down to the last score,
which is 35, divided by the total number of cases.
That is,
= = (53 + 36 + 57 + … + 60 + 49 + 35)/100
You have many ways computing the mean. The traditional long and
tedious computation techniques have outlasted their relevance due to
advancement of technology and the emergence of statistical software. Using
your specific calculator, you will see the symbols and . Just follow the
simple steps indicated in the guide. There are also simple steps in Microsoft
Excel. Different versions of the statistical software SPSS offer the fastest way
of obtaining the mean, even with hundreds of scores in a set. There is no loss
of original information because you are dealing with original individual scores.
The use of statistical software will be explained later.
In the traditional way, it cannot be argued that you can see at a glance
how the scores are distributed among the range of values in a condensed
manner. You can even estimate the average of the scores by looking at the
frequency in each class interval. In the absence of statistical program, the
mean can be computed with the following formula.
Median. It is the value that divides the ranked scores into halves, or the
middle value of the ranked scores. If the number of scores is odd, then there
is only one middle value that gives the median. However, if the number of
scores in the set is even number, then there are 2 middle values. But if there
are more than 50 scores, arranging the scores and finding the middle value
will take time. This formula will help you determine the median:
4. Find the exact limits of the median class. In this case, class 44.5-49.5. the
lower limit then is 44.5
Summing up these steps and substituting these values to the formula,
we have:
Mode. It is the easiest measure of central tendency to obtain. The mode is the
score or value with the highest frequency in the set of scores. If the scores are
arranged in a frequency distribution, the mode is estimated as the midpoint of
the class interval which has the highest frequency. This class interval with the
highest frequency is also called the modal class. In a graphical representation
the frequency distribution, the mode is the value in the horizontal axis at which
the curve is at its highest point (peak). If there are two highest points, then,
there are two modes, as discussed in previous lesson. When all the scores in
a group have the same frequency, the group of scores has no mode.
Considering the test data in Table 2, it can be seen that highest
frequency of 21 occurred in the class interval 45 - 49. The rough estimate of
the mode is 42, which is the midpoint of the class interval.
As manual computations of the mean, median and mode are so
lengthy and tedious processes; technology makes it simpler through the use
of Microsoft Excel application. Here’s the simple guide to observe.
B12. Calculating the mean of scores located in several columns and rows can
also be possible provided they are all selected or defined.
Median in Excel
Mode in Excel
Mode helps you to find out the value that occurs most number of times.
When you are working on a large amount of data, this function can be a lot of
help. To find the most occurring value in Excel, use the MODE function and
select the range you want to find the mode of. In our example below, we use
=MODE(B2:B12) and since 2 students have scored 55 we get the answer as
55.
In situations where there are two or more modes in your data set, the
Excel MODE function will return the lowest mode.
Scale of Measurement
There are four levels of measurement that apply to the treatment of test
data: nominal, ordinal, interval, and ratio. In nominal measurement, the
number is used for labeling or identification purposes only. An example is the
You can see that different distributions may be symmetrical, may have
same average values (mean, median, mode), but how the scores in each
distribution are spread out around these measures are different. In A, as
Range. It is the difference between the highest (X H) and the lowest (XL)
scores in a distribution. It is the simplest measure of variability but also
considered as the least accurate measure of dispersion because its value is
determined by just two scores in the group. It does not take into consideration
the spread of all scores; its value simply depends on the highest and lowest
scores. Its value could be drastically changed by a single value. Consider the
following examples:
Determine the range for the following scores: 9, 9, 9, 12, 12, 13, 15, 15,
17, 17, 18, 18, 20, 20, 20.
Range = XH - XL
= 20 – 9
Range= 11
Now, replace a high score in one of the scores, say, the last score and
make it 50. The range becomes:
Range = XH - XL
= 50 – 9
= 41
We observed that with just a single score, the range increased so high.
This can be interpreted as large dispersion of test scores; however, when you
look at the individual scores, it is not.
Note that while the distributions contain different scores, they have the
same mean. If we ask how each mean represents the score in their
respective distribution, there will be no doubt with the mean of distribution C
because each score in the distribution is 12. How about in distributions A and
B? For these two distributions, the mean of 12 is a better estimate of the
scores in distribution B than in distribution A. We can see that no score in B is
more than 4 points away from mean of 12. However, in distribution A, half of
the 12 scores is 4 points or more away from the mean. We can also say that
there is less variability of scores in B than in A. However, we cannot just
determine which distribution is dispersed or not by merely looking at the
numbers, especially when there are many scores. We need a reliable index of
variability, such as variance or standard deviation that takes into consideration
all the scores.
Recall that ∑(X- ) is the sum of the deviation scores from the mean,
which is equal to zero. As such, we square each deviation score, then sum up
all the squared deviation scores, and divide it by the number of cases, this
yields the variance. Extracting its square root gives us the standard
deviation.
= population mean
= population mean
Class A Class B
X (X – ) (X – 2
) X (X – ) (X – )2
22 22-12 100 16 16-12 16
18 18-12 36 15 15-12 9
16 16-12 16 15 15-12 9
14 14-12 4 14 14-12 4
12 12-12 0 12 12-12 0
11 11-12 1 11 11-12 1
9 9-12 9 11 11-12 1
7 7-12 25 9 9-12 9
6 6-12 36 9 9-12 9
5 5-12 49 8 8-12 16
= 12 ∑(X– )2 = 276 = 12 ∑(X- )2 = 74
The values 276 and 74 are the sum of the squared deviations of scores
in Class A and Class B, respectively. If these are divided by number of scores
in each class, this gives the variance (S2):
The values above are both in squared units, while our original units of
scores are not in squared units. When we find their square roots, we obtained
values that are on the same scale of units as the original set of scores. These
too give the respective standard deviation (S) of each class and computed
as follows:
When you are finding the variance using Excel, you can simply use the
VAR function and select the range and you will find the desired variance. We
take the data in Class A to find the variance of the scores obtained by
students. So we use =VAR(A2:A11). In the case of Class B, the same function
is used but the data in the cells from B2 to B 11 are considered. Thus, we use
=VAR(B2:B11).
In both instances, the Excel values match with the results using the
manual process. For larger number of scores in a distribution, Microsoft Excel
or other software is more appropriate and efficient in obtaining the variance
and standard deviation. This can be done in few seconds if you have already
entered and saved the data used to get the measure of central tendency. In
addition, the VAR and STDEV functions can still be used even if scores are
encoded in several columns and rows as shown in the next illustration.
where Sk = skewness
= Mean
Mdn = median
SD = standard deviation
difference between mean and median moves farther from 0, the coefficient of
skewness changes to either lower or higher values.
Measures of Position
While measures of central tendency and measures of dispersion are
used often in assessment, there are other methods of describing data
distributions such as using measures of position or location. It is about the
score’s position in the distribution. What are these measures?
Figure 6. Quartiles
Note that in the above example, the upper and lower 50% contains
even center values, so the median in each half is the average of the two
Q= = 10
Decile. It divides the distribution into 10 equal parts. There are 9 deciles such
that 10% of the distribution are equal or less than decile 1, (D 1), 20% of the
scores are equal or less than decile 2 (D 2); and so on. A student whose mark
is below the first decile is said to belong in decile 1. A student whose mark is
above the 9th decile belongs to decile 10. If there are a small number of data
values, decile is not appropriate to use.
Percentile. It divides the distribution into one hundred equal parts. In the
same manner, for percentiles, there are 99 percentiles such that 1% of the
scores are less than the first percentile, 2% of the scores are less than the
second percentile, and so on. For example, if you scored 95 in a 100-item
test, and your percentile rank is 99th, then this means that 99% of those who
took the test performed lower than you. This also means that you belong to
the top 1% of those who took the test. In many cases, percentiles are wrongly
interpreted as percentage score. For example, 75% as a percentage score
means you get 75 items correct out of a hundred items, which is a mark or
grade reflecting performance level. But percentile is a measure of position
such that 75th percentile as your mark means 75% of the students who took
the test got lower score than you, or you score is located at the upper 25% of
the class who took the same test. For every large data set, percentile is
appropriate to use for accuracy. This is one reason why percentiles are
commonly used in national assessments or university entrance examinations
with large dataset or scores in thousands.
standard of P15.00. Can we say that the latter distribution is more spread than
the former? Or can we compare standard deviations in meter and pesos? The
answer seems obvious. We cannot conclude anything by direct comparison of
measures of absolute dispersion because they are of different units or
different categories. In the first example, one is the distribution of mathematics
scores while the other is the distribution of science scores. To make the
comparison logical, we need a measure of relative dispersion is also known
as the coefficient of variation (CV). This is simply the ratio of the standard
CVmath = (100)
= 0.25
= 25%.
Suppose the mean score of students in science is 18 with standard
deviation of 5. The coefficient of variation is:
CVsci = (100)
= 0.277
= 28%.
Looking at 10 and 5 as the standard deviation of mathematics and
science scores, respectively, this may lead you to judge that the set of scores
in mathematics is twice more dispersed than scores in science. From the
computed coefficient of variations as measure of relative dispersion, we can
clearly see that the scores in mathematics are more homogenous than the
scores in science.
deviation from the mean score. In other words, each potion under the curve
contains a fixed percentage of cases as follows:
68% of the scores fall between one standard deviations bellow and
above the mean
95% of the scores fall between two standards deviations below and
above the mean
99.77% of the scores fall between three standards deviations below
and above the mean
Figure 8 illustrates the theoretical model.
From the above figures, we can state the properties of the normal
distribution:
1. The mean, median, and more are all equal
2. The curve is symmetrical. As such, the value is a specific area on the
left is equal to the value of its corresponding area on the right.
3. The curve changes from concave to convex and approaches the X-
axis, but the tails do not touch the horizontal axis.
4. The total area on the curve is equal to 1.
Standard Scores
In the preceding sections, we discussed raw scores, which are the
original scores collected from an actual testing activity. However, there are
situations where computing measures from raw scores may not be enough.
Consider a situation where we, as a student, want to know in what subjects
we performed best and poorest to determine where you need to exert. Or
maybe in the past, you took entrance examinations in more than one
university and asked yourself in which university you performed best. In case
like these we cannot find the answer by merely relying on a single score.
More concretely, if you get a score of 86 in English and 90 in Physics, you
cannot conclude that you perform better in Physics than in English. This is
ridiculous because 90 is higher than 86. Say, you later learned that the mean
score of the class in English was 80, and in Physics, the mean score like 86
or 90 is not meaningful unless it compared with other test scores. In particular,
a score can be interpreted more meaningfully if we know the mean and
variability of the other scores where that single score belongs. Knowing this, a
raw score can be converted into standard scores.
Z-score. There are many kinds of standard scores. The most useful is the z-
score, which is often used to express a raw score relation to the mean and
able to know whether your test score, say, X is above or below the average
score. However, you cannot say whether your test score or grade is better or
worse than the average score. Again, the importance of knowing the standard
deviation is highlighted here. The standard deviation helps you locate the
relative position of the score in a distribution. The equation above gives you
the z-score, which can indicate the number of standard deviations the score
is above or below the mean. A z-score is called a standard score, simply
because it is a deviation score expressed in standard deviation units.
From the above, if 86 and 90 are your scores in the two subjects, you
can confidently say that, compared with the rest of your class, you performed
better in English than in Physics. That is because in English, your
performance is 2 standard deviations above the mean, while in Physics, you
are 2.5 standard deviations below the mean. While 90 is numerically higher,
this score is more than half standard deviation below the average
performance of the class where you belong, while 86 is above the mean and
even 2 standard deviations above it. Having been transformed in the same
scale, the two graphs can be taken as one standard distribution of score.
T-Score. As you can see in the computation of the z-score, it can give us a
negative number, which simply means the score is below the mean. However,
communicating negative z-score as below the mean may not be
understandable to others. We will not even say to students that they got a
negative z-score. Also, a z-score may also be a repeating or non-repeating
decimal, which may not be comfortable for others. One option is to convert a
z-score into a T-score, which is a transformed standard score. To do this,
there is calling in which a mean of 0 in a z-score is transformed into mean of
50, and the standard deviation in a z-score is multiplied by 10. The
corresponding equation is:
T-score = 50 + 10z
T-score = 50 + 10 (-2)
= 50 - 20
= 30
= 50 + 20
= 70
Example:
Scores in stanine scale have some limitations. Since they are in a 9-
point scale and expressed as a whole number, they are not precise. Different
z-scores or T-scores may have stanine score equivalent.
With the above percentage distribution of scores in each stanine, you can
directly convert a set of raw scores into stanine scores. Simply arrange the
raw scores from lowest to highest, and with the percentage of scores in each
stanine, we can directly assign the appropriate stanine score in each raw
score. On interpretation of stanine score, let us say Kate has a stanine score
of 2. We can see that her score is somewhere at the low or bottom 7 percent
of the scores. In the same way, if John’s score is the 6 th stanine, it falls
between the 60th and 77th percentile, simply because 60 percent of the scores
are below the 6th stanine and 23 percent of the scores are above the 6 th
stanine. For qualitative description, stanine scores of 1, 2, and 3 are
considered as below average; 4, 5, and 6 are average, and 7, 8, and 9 are
above average. Thus, we can say that your score of 86 in English is above
average. Similarly, Kate’s score is below average while that of John is
average. Figure 11 illustrates the equivalence of the different commonly-used
standard scores.
First, we will encode the two (2) sets of raw data representing X and Y
in either two columns or rows in Excel as shown in the next illustration. At any
free cell, type =CORREL(A3:A12,B3:B12) as reflected in the worksheet. The
A3:A12 are the cell addresses of the first variable X, while B3:B12 are the cell
addresses of the second variable Y. Take note that the cell ranges of the 2
variables are separated by a comma.
In contrast, the manual process will give us this complex process and
the same result.
Summary
Measures of central tendency include the mean, median and mode.
Mean is the average of given sets of scores. You should add the
numbers up then divide by the number of the cases.
Median is the number in the middle when you order the numbers in an
ascending order. If there are two numbers in the middle, you should
take the average of those two numbers.
Mode is the number which is repeated the most in the set.
Measures of dispersion include the range, interquartile range, semi-
interquartile range, quartile deviation, variance and standard deviation.
The range of a dataset is the difference between the largest and
smallest values in that dataset.
The interquartile range is the middle half of the data that is in between
the upper and lower quartiles. Dividing this by 2 gives us the quartile
deviation.
Variance is the average squared difference of the values from the
mean. Unlike the previous measures of variability, the variance
includes all values in the calculation by comparing each value to the
mean.
The standard deviation is the standard or typical difference between
each data point and the mean.
Measures of location can be categorized as percentile, decile and
quartile.
A normal distribution is sometimes called the bell curve because its
distribution occurs naturally in many situations. The bell curve is
symmetrical where half of the data falls to the left of the mean and
another half falls to the right.
Standardized scores can be a z-score, T-score or stanine.
Measures of covariablity tell us to a certain extent the relationship
between two tests or factors.
Assessment
A. Answer the following questions orally.
1. What are the measures of central tendency?
2. What are the measures of dispersion?
3. What are the measures of position?
4. What is covariability?
G. After all what you have done, can you now identify the elements that are
implied in the empty boxes below?
Activity 1
Identify yourself in a group of four classmates. Here is the task you need to
work your team members.
Secure a set of old test papers that have been scored by a teacher or a
student teacher. It is advised that number of cases be at least 100. It is
understood that the 100 scores came from the same test. See to it that
you observe confidentially of the documents you have requested. No
name should be identified in your written work, but use codes to
identify the observations or cases.
Here are the tasks:
1. Prepare a data set for the test scores.
2. With the aid of your scientific calculator, or Microsoft Office Excel,
find the following:
a. Mean
b. Median
c. Mode
3. Describe the type of distribution of test scores.
4. Which measure of central tendency is most appropriate to describe
the distribution? Why do you say so? Explain.
5. Select three (3) students in the list. Describe the test performance
of each student relative to the performance of all the students in the
whole class.
Activity 2
Interview a teacher on how she decides to pass or fail a student. Report the
specific questions that you have asked and the corresponding responses you
have captured in this interview activity. Your analysis and presentation should
reflect the application of the measures of central tendency, measures of
variability, and of the measures of location.
Activity 3
With your team members, make a visit to any of the following offices of a
school:
1. Office of the Guidance Councilor
2. Testing Center
3. Office of Student Affairs
Request for any of the following data which you can avail of:
1. IQ test scores
2. Aptitude test scores
3. Admission test scores
4. Qualifying examination scores
5. Any psychological test scores
It is advised that the number of observations should at least be 50. The
larger the number of cases would be better.
Please emphasize the data will be held in full confidentiality. They have
the option not to give the names of the examinees as ethical consideration.
Do not apply coding technique to label the different observation.
From the test scores that you have gathered, do the following:
1. Make a frequency distribution of the test scores.
B. 10 D. 13
4. What does it mean when a student got a score at the 70 th percentile on
a test?
a. The performance of the student is above average
b. The student answered 70 percent of the items correctly
c. The student got at the least 70 percent of the correct answers
d. The student’s score is equal or above 70 percent of the other
students in the class
5. Which best describes a normal distribution?
A. Positively skewed c. symmetric
B. Negatively skewed D. bimodal
6. What does a large standard deviation indicate?
A. Scores are not normally distributed
B. Scores are not widely spread, and the median is unreliable
measure of central tendency.
C. Scores are widely distributed where the mean may not be a reliable
measure of central tendency.
D. Scores are not widely distributed, and the mean is recommended
as more reliable measure of central tendency
7. In a normal distribution, approximately what percentage of scores is
expected to fall within three-standard deviation from the mean?
A. 34% C. 95%
B. 68% D. 99%
8. Which of the following is interpreted as the percentage of scores in a
reference group that falls below a particular raw score?
A. Standard scores C. Reference group
B. Percentile rank D. T-score
9. For the data illustrated in the scatter plot
below, what is the reasonable product-
moment correlation coefficient?
A. 1.0
B. -1.0
C. 0.90
D. -0.85
10. A Pearson test statistics yields a
correlation coefficient (r) of 0.90. If X
represents scores on vocabulary and Y,
the reading comprehension test scores, which of the following best
explains r=0.90?
A. The degree of association between X and Y is 81%.
B. The strength of relationship between vocabulary and reading
comprehensive is 90%.
C. There is almost perfect positive relationship between vocabulary
test scores and reading comprehension.
a. How do you find the mean if data are grouped? What is the most
appropriate class interval for this set of data? What other information
you need to generate and achieve the task of what is the mean?
b. How do we determine the median of the scores?
In what class interval did the median fall?
What is the cumulative frequency below the median class?
What is the frequency of the median class?
What is the lower limit of the median class?
What is the median equal to?
P. Take hold of a specific calculator. Use the raw score formula in finding the
standard deviation. If you have Excel or whatever software, you can
directly work on the data.
1. What is the variance?
2. What is the standard deviation?
Q. Assuming that the scores are normally distributed, what is the range of
scores the would fall between:
1. ±1σ from the mean. Explain your answer.
2. ±2σ from the mean. How did you determine this ranger of scores?
3. ±3σ from the mean. Explain your answer.
4. Define the 25th percentile. How did you determine it?
5. Where is the 75th percentile? How did you locate it? How many fall on
this percentile?
6. How many got a percentile rank of 99? How did you determine it?
Explain.
R. With the help of your scientific calculator, Excel, or whatever, load a data
set - either hypothetical data or an actual data - you have accessed from
additional documents or any research studies. Analyse the data and enter
the values into the following table:
Mean
Median
Mode
Range
Variance
Standard Deviation
Skewness
Kurtosis
Examine the values you have written in the table. Discuss the graph in
relation to the values in the table.
Educator’s Input
In many programs, topics on measures of central tendency, measures
of variability, and measure of correlation are taught generally by mathematics
teachers. This is because these topics are core content in Statistics and for
many, statistics is mathematics. Many personal encounters with students
made me think that instructional discourse on the subject of Measurement
and Assessment of Learning that involves statistics tends to be restricted to
performing mathematical procedures, such as calculating the mean, median,
mode, standard deviation, etc. My graduate students who are already
practitioners in their own field have shared their year-end reports of student
performance, including computations of the mean, median, mode, and
standard deviation. I am happy to hear that they did not encounter difficulties
in mechanical computations with scientific calculators and Microsoft Excel.
When I asked why they have to compute all the three measures of central
tendency for their student performance report, they could not give me a
convincing reason. I further asked what they did with the standard deviation
and how they utilized all those numerical values. I did not get logical
justification, while they were confident in providing the conceptual meaning of
mean, median, mode, variability measures; getting the “big ideas” that
underlie these statistical concepts and their functionalities in improvement of
learning appeared very much wanting. Whether I had been guilty, somehow,
of being too procedural in my own teaching of Statistics during my early years
of teaching, I have later discovered a teaching approach to deepen my
students’ understanding of measurement theories, concepts, and principles.
The strategy of “posing relevant scenario” or simply “scenario-posing” for
students to examine and explore can be effective. From the scenario, I can
generate prompts to invite my students to participate in the class discussion.
Most likely, students become interested because in presenting a scenario, I
create a story. I want to believe that a story-form presentation works across
age levels, not only for young children. In the early years, the scenarios I
presented were mostly hypothetical; my objective was primarily to dramatize
the main idea embodied in the statistical measure. In later years, these
scenarios became more real and effortless based on my actual experiences
as a teacher educator and a researcher.
In presenting the scenario, I always have in mind the “seed concepts”,
which are in coherence with my lesson objectives. I have been practicing this
approach for quite some time, and I think teachers using this strategy will be
able to deepen student knowledge by eliciting from them the nuances of the
information embodied in the story problem. These examples maybe worth
sharing:
1. The town I live has a population of 5,000 native people in 2017, and
then mean income per person is Php 30,000.00. Now, suppose Mr.
Manuel from a distant region and a millionaire moves to my town in
2018. Let us say that the income of Mr. Manuel was Php
120,000,000.00. So, with 5,001 people now living in my town, what is
the mean income of my town? Php53,989.00? Does this information
indicate that the 5000 natives in my town suddenly made Php 23,989
more in their income 2018?
References
Cheusheva, Svetlana (September 4, 2020). Mean, median and mode in
Excel. Retrieved from https://www.ablebits.com/office-addins-
blog/2017/05/24/mean-median-mode-excel/
David et al. (2020). Assessment in Learning 1. Manila: Rex Book Store.
De Guzman, E. and Adamos, J. (2015). Assessment of Learning 1. Quezon
City: Adriana Publishing Co., Inc.
Frost, Jim (n.d.).Measures of Variability: Range, Interquartile Range,
Variance, and Standard Deviation. Retrieved from
https://statisticsbyjim.com/basics/variability-range-interquartile-
variance-standard-deviation/.
McLeod, Saul (2019). Introduction to the Normal Distribution (Bell Curve).
Retrieved from https://www.simplypsychology.org/normal-
distribution.html
Mean, Median, Mode: What They Are, How to Find Them. Retrieved from
https://www.statisticshowto.com/probability-and-statistics/statistics-
definitions/mean-median-mode/
Statisticsfun (September 20, 2009). How to calculate Standard Deviation,
Mean, Variance Statistics, Excel. [Video]. YouTube:
https://www.youtube.com/watch?v=efdRmGqCYBk.
Pre-discussion
Grading and reporting learners’ test performance is a complex task. It
requires specific knowledge, skills, and experience. To perform successfully
the assigning of grades and reporting of level of performance or achievement,
pre-service teachers should be able to understand its purpose, identify
different methods of scoring and grading test performance, differentiate the
various types of test scores, and interpret test results based on norms and
pre-set standards.
What to Expect?
At the end of the lesson, the students can:
1. define what is grading;
Definition of Grading
In his paper, Magno discussed (2010) that effective and efficient way of
recording and reporting evaluation results is very important and useful to
persons concerned in the school setting. Hence, it is very important that
students’ progress is recorded and reported to them, their parents and
teachers, school administrators, and counselors as well because this
information shall be used to guide and motivate students to learn, establish
cooperation, and collaboration between the home and the school. It is also
used in certifying the students’ qualifications for higher educational levels and
for employment. In the educational setting, grades are used to record and
report students’ progress.
Grades are essential in education such that students’ learning can be
assessed, quantified, and communicated. Every teacher needs to assign
grades which are based on assessment tools such as tests, quizzes, projects,
and so on. Through these grades, achievement of learning goals can be
communicated with the students and parents, teachers, administrators, and
counselors. However, it should be remembered that grades are just a part of
communicating student achievement; therefore, it must be used with
additional feedback methods.
Grading implies (a) combining several assessments, (b) translating the
result into some type of scale that has evaluative meaning, and (c) reporting
the result in a formal way. From this definition, we can clearly say that grading
is more than quantitative values as many may see it; rather, it is a process.
Grades are frequently misunderstood as scores. However, it must be clarified
that scores make up the grades. Grades are the ones written in the report
cards of students which is a compilation of students’ progress and
achievement all throughout a quarter, a trimester, a semester or a school
year.
Grades are symbols used to convey the overall performance or
achievement of a student and they are frequently used for summative
assessments of students. Take for instance two long exams, five quizzes, and
ten homework assignments as requirements for a quarter in a particular
subject area. To arrive at grades, a teacher must be able to combine scores
from the different sets of requirements and compute or translate them
according to the assigned weights or percentages. Then, he or she should
also be able to design effective ways on how he/she can communicate it with
students, parents, administrators and others who are concerned. Another
term not commonly used to refer to the process is marking. Figure 1 shows a
graphical interpretation summarizing the grading process.
There are various reasons why we assign grades and report learners’
test performance. Grades are alphabetical or numerical symbols/marks that
indicate the degree to which learners are able to achieve the intended
learning outcomes. Grades do not exist in a vacuum but are part of the
instructional process and serve as a feedback loop between the teacher and
learners. They are one of the ways to communicate the level of learning of the
learners in specific course content. They give feedback on what specific
topic/s learners have mastered and what they need to focus more when they
review for summative assessment or final exams. In a way, grades serve as a
motivator for learners to study and do better in the next tests to maintain or
improve their final grade.
Grades also provide the parents, who have the greatest stake in
learners’ education, precise information about their children’s achievements.
They give teachers the bases for improving their teaching and learning
practices and for identifying learners who need further educational
intervention. They are also useful to school administrators who want to
evaluate the effectiveness of the instructional programs in developing the
needed skills and competencies of the learners.
Magno (2020) emphasized that the purposes of grading can be
categorized into four major parts in the educational setting.
Administrative Purposes
At their end, school administrators can use the grades of students for a
more general purpose as compared to teachers, such that they can utilize
grades to evaluate programs, identify and assess areas that needs to be
improved and whether or not curriculum goals and objectives of the school,
has been attained by the students through their institution.
Promotion and Retention. Grades can serve as one factor in determining if
a student will be promoted to the next level or not. Through the grades of
students, skills, and competencies required of him to have for a certain
level can be assumed whether or not he was able to achieve the
curriculum goals and objectives of the school and/ or the state. In some
schools, the grade of students is a factor taken into consideration for his/
her eligibility in joining extracurricular activities (performing, theater arts,
varsity, cheering squads… etc.). Grades are also used to qualify a student
to enter high school or college in some cases. Other policies may arise
depending on the schools’ internal regulations. At times, failing marks
may prohibit a student from being a part of the varsity team, running for
officer, joining school organizations, and some privileges that students
with passing grade get. In some colleges and universities, students who
get passing grades are given priority in enrolling for the succeeding term,
as compared to students who get failing grades.
Placement of Students and Awards. Through grades of students,
placement can be done. Grades are factors to be considered in placing
students according to their competencies and deficiencies. Through
which, teaching can be more focused in terms of developing the strengths
and improving the weaknesses of students. For example, students who
consistently get high, average and failing grades are placed in one section
wherein teachers can be able to focus more and emphasize students’
needs and demands to ensure a more productive teaching learning
process. Another example which is more domain-specific would be
grouping students having same competency on a certain subject together.
Through this strategy, students who have high ability in Science can
further improve their knowledge and skills by receiving more complex and
advanced topics and activities at a faster pace, and students having low
ability in Science can receive simpler and more specific topics at a slower
pace. Aside from placement of students, grades are frequently use as
basis for academic awards. Many or almost all schools, universities and
colleges have honor rolls, and dean’s list, to recognize student
achievement and performance. Grades also determine graduation awards
for the overall achievement or excellence a student has garnered
throughout his/her education in a single subject or for the whole program
he has taken.
Program Evaluation and Improvement. Through the grades of students
taking a certain program, program effectiveness can be somehow
evaluated. Grades of students can be a factor used in determining
whether the program was effective or not. Through the evaluation
process, some factors that might have affected the program’s
effectiveness can be identified and minimized to improve the program
further for future implementations.
Admission and Selection. External organizations from the school also use
grades as reference for admission. When students transfer from one
school to another, their grades play crucial role for their admission. Most
colleges and universities also use grades of students in their senior year
in high school together with the grade they shall acquire for the entrance
exam. However, grades from academic records and high stakes tests are
not the sole basis for admission; some colleges and universities also
require recommendations from the school, teachers and/or counselors
about students’ behavior and conduct. The use of grades is not limited to
the educational context. It is also used in employment, for job selection
purposes, and at times even in insurance companies that use grades as
basis for giving discounts in insurance rates.
For the above item, the correct answer is c (X = -32) and this will merit
a score. Responses other than c will be given zero (0) point.
Item # Score
1 1
2 0
3 -0.25
4 1
5 1
6 0
7 -0.25
8 1
9 1
10 1
Total 6 + (-0.50) = 5.5
Example:
Linda obtained a score of 35% in her Reading Test. What does her score
mean? Justify your answer.
a. Linda got 55% of the test items correct.
b. Linda was able to answer correctly more than half of the items.
c. Linda obtained a raw score lower than those obtained by 55% of his
peers.
d. If the test has 60 items, Linda would probably have 33 correct answers.
For this item, each response option has an assigned score with its
corresponding rationale. An example of how the item can be scored is shown below:
Option Points Rationale
A 3 Since the core was presented in percent, this is the correct
interpretation.
B 1 While the interpretation may be correct, it does not give a
more specific meaning to the score. Besides, the same
interpretation can also be applicable to scores higher than
51%.
C 0 This interpretation is wrong as this interpretation is applicable
to a score of 55th percentile rank.
D 2 This interpretation gives an example how the score was
computed
same or equivalent final exams and apply the formula for standard scores to
compute for the percentile ranks for each range of scores. On the other hand,
passing grades/scores are usually set by the department or the school based
on their standards (e.g., A (90-100 percent), B (80-89 percent), C (70-79
percent), or F (0-69 percent).
Marking or scoring constructed-type of tests, such as essay and
performance tests, require standardized scoring schemes so that scores are
reliable and have the same valid meaning for all learners. There are four
types of rating scales for the assessment of writing, which can also be applied
to other authentic or performance-type assessment. These four types of
scoring are (1) Holistic, (2) Analytic, (3) Primary Trait, and (4) Multiple Trait
scoring.
Holistic Scoring. It involves giving a single, overall assessment score for an
essay, writing composition, or other performance-type of assessment as a
whole. Although the scoring rubric for holistic scoring lays out specific
criteria for evaluating a task, raters do not assign a score for each
criterion. Instead, as they read writing task or observe a performance
task, they balance strengths and weaknesses among the various criteria
to arrive at an overall assessment. Holistic scoring is considered efficient
in terms of time and cost. It also does not penalized poor performance
based on only one aspect (e.g., content, delivery, organization,
vocabulary, or coherence for oral presentation). However, it is said that
holistic scoring does not provide sufficient diagnostic information about
the students’ ability as it does not identify the areas for improvement and
is difficult to interpret as it does not detail the basis for evaluation.
The following is an example of a rubric for an oral presentation:
Rating/Grade Characteristics
A Is very organized. Has a clear opening statement that catches
(Exemplary) audience’s interest. Content of report is comprehensive and
demonstrates substance and depth. Delivery is very clear and
understandable. Uses slides/multimedia equipment effortlessly
to enhance presentation.
B Is mostly organized. Has opening statement relevant to topics.
(Satisfactory) Has appropriate pace and without distracting mannerisms.
Looks at slides to keep on track.
C Has opening statement relevant to topic and but does not give
(Emerging) outline of speech; is somewhat disorganized. Lacks content and
the students’ test performance can be compared among each other in the
class or with their peers in another section. In the same manner,
percentage score is suitable to use in subjects wherein a standard has
been set. For example, if an algebra subject sets a passing score of 60%
in a test (e.g., for example it is considered as average), the teachers and
learners would know if a learner has met the desired level of competencies
through his/her percentage score.
Aside from the above test scores, the decision on what type of test
scores to use is based on whether the learners’ test performance is to be
compared with a standard or criterion or with the scores of other learners
or peers. This decision will entail the choice between the two major types
of grading system: 1) criterion-referenced; and 2) norm-referenced grading
system.
3. Criterion-Referenced Grading System. This is a grading system wherein
learners’ test scores or achievement levels are based on their
performance in specified learning goals and outcomes and performance
standards. Criterion-referenced grades provided a measure of how well
the learners have achieved the preset standards, regardless of how
everyone else does. It is therefore important that the desired outcomes
and the standards that determine proficiency and success are clear to the
learners at the very start. These should be indicated in the course
syllabus. Criterion-referenced grading is premised on the assumption that
learners’ performance is independent of the performance of the other
learners in their group/class.
The following are some of the types of criterion-referenced scores
or grades:
a. Pass or Fail Grade. This typed of score is most appropriate if the test
or assessment is primarily or entirely to make a pass or fail decision. In
this type of scoring, a standard or cut-off score is preset, and a learner
is given a score of Pass if he or she surpassed the expected level of
performance or the cut-off score. Pass or Fail scoring is most
appropriate for comprehensive or licensure exams because there is no
limit to the number of examinees who can pass or fail. Each individual
level. However, it would be best that these descriptors are paired with
specific performance indicators that identify the qualitative differences
between grade categories.
Another disadvantage of letter grades is that the cut-offs
between grade categories are always arbitrary and difficult to justify.
For example, if a score of C ranges from 76 to 85, learners who get a
grade of 76 in a writing test and those who receive a grade of 85 will
both get the same letter grades of C despite the nine-point difference.
Now, if the next range of grades is 86 to 96, then the one who gets an
86 receives a grade of B although it is just one score higher than 85,
which receives a grade of C. Furthermore, letter grades lack the
richness of more detailed grading methods.
c. Plus (+) and Minus (-) Letter Grades. This grading provides a more
detailed descriptions of the level of learners’ achievement or task/test
performance by dividing each grade category into three levels, such
that a grade of A can be assigned as A+, A and A-; B as B+ B and B-,
and so on. Plus (+) and minus (-) grades provide a finer discrimination
between achievement or performance level. They also increase the
accuracy of grades as a reflection learners’ performance; enhance
student motivation (i.e., to get a high A rather that an A-); and
discriminate among performance in a very similar pool of learners, such as
those in advance course or star sections. However, +/- grading system is
viewed as unfair, particularly for learners in the highest category; creates for
stress for learners; and is more difficult for teachers as they need to deal with
more grade categories when grading learners. Examples of the descriptors for
plus (+) and minus (-) letter grades are presented in the next matrix:
(+)/(-) Letter Interpretation
Grades
A+ Excellent
A Superior
A- Very Good
B+ Good
B Very Satisfactory
B- High Average
C+ Average
C Fair
C- Pass
D Conditional
E/F Failed
d. Standard Score. They are raw scores that are converted into a
common scale of measurement that provides meaningful description of
the individual scores within the distribution. A standard score describes
the difference of the raw score from a sample mean, expressed in
standard deviations. Two most-commonly used standard scores are (1)
z-score and (2) T-score.
i. Z-score is one of a standard score. Z-score have a mean of 0 and
a standard deviation of 1. It is computed using the following
formula.
Standard scores are useful when you want to compare learners’ test
While the difference between raw scores of Luis and Michael from
the mean is the same (i.e., 5). Michael’s standard score is lower
than Luis’ standard scoring (z of 1 vs. z of 5). This is because the
variability in scores in Michael’s class is higher than that in Luis’
1. Identify the criteria for rating the essay. The criteria or standards for
evaluating the essay should be predetermined. Some of the criteria that
can be used include content, organization/format, grammar proficiency,
development and support, focus and details, etc. It is important that the
specific standards and criteria included are relevant to the type of
performance tasks given.
2. Determine the type of rubric to use. There are two basic types of rubric:
holistic or analytic scoring system. Holistic rubrics require evaluating the
essay and taking into consideration all the criteria. Only a single score is
given based on the overall judgment of the learners’ writing composition.
Holistic rubric is viewed to be more convenient for the teachers as it
requires less area or aspect of writing to evaluate. However, it does not
provide specific feedback on what course topic/content or criteria that the
students are week at and need to improve on. On the other hand, analytic
scoring system requires that the essay is evaluated based on each of the
criteria. It provides useful feedback on learner’s strengths and weaknesses
for each course content or criterion.
3. Prepare the rubric. In developing rubric, the skills and competencies related to
essay writing should first be identified. These skills and competencies represent
the criteria. Then, performance benchmarks and point values are determined.
Performance marks can be numerical categories, but the most frequently used are
descriptors with corresponding rating scale.
Points Sample Performance Benchmarks
Values
1 Needs Beginning Novice Inadequate
Improvement
2 Satisfactory Developing Apprentice Developing
3 Good Accomplished Proficient Proficient
4 Exemplary Exceptional Distinguished Skilled
Illustrative Example:
Assuming that a student has obtained the following raw scores in the different
components in English subject:
Components Total Score Total Possible Score
Written Works 145 160
Performance Tasks 100 120
Quarterly Assessment 50 50
Summary
In this lesson, we were able to discuss exhaustively the purpose of grading
and communicating learners’ test performance, the various methods in
marking or scoring tests and even performance tasks, the different methods in
grading learners’ performance in assessments, types of test scores, general
guidelines in grading tests or performance tasks, general guidelines in scoring
essay tests, and how test results be communicated. Finally, the guidelines on
classroom assessment of the DepEd K to 12 Basic Education Program were
likewise highlighted.
Enrichment
1. Read the following articles:
1. Magno, C. (2010). The Functions of Grading. The Assessment
Handbook, Vol. 3.
2. Guskey, T. R.(2001). Grading and Reporting Student Learning. Corwin
Press: KY, USA.
3. Brookhart, Susan M. (2013). How to Create and Use Rubrics for
formative assessment and grading. Virginia, USA: ASCD.
2. Watch this video:
Nancy Heilbronner (2019, April 2). Grading and Reporting. [Video].
YouTube: https://www.youtube.com/watch?v=SHBQTbymAP4
Assessment
A. Let us review what you have learned about grading and communicating
test results.
1. What are the purpose of grading and communicating learners’ test
performance?
2. What are the different methods in marking or scoring tests or
performance tasks?
ERNIE C. CERADO, PhD/MA. DULCE P. DELA CERNA, MIE 266
SULTAN KUDARAT STATE UNIVERSITY
B. After the discussion on grading and reporting test scores, you are now
ready to identify what methods of scoring/grading and types of scores that
you can employ in your assessments. Let us apply what you have learned
by extending the assessment plan that you have developed in earlier
lesson, or you may consider anew. In additional to the desired learning
outcomes, course topic, and test formats that you have listed down for
each subject, please identify the methods of scoring, types of grades, and
reporting strategies that you will employ.
C. Let us then come up with a grading and reporting scheme for each type of
assessment that you will employ in each of your subjects. In the
development of the grading and reporting scheme, you need the following
information:
1. Purpose of Assessment: Why is this assessment being conducted?
Is it for learners’ monitoring and improvement (formative), or is it for
demonstrating student achievement (summative)?
2. Desired Learning Outcomes for the Topic/Subject Area: What are
the learning outcomes expected from the learners for this unit/subject?
3. Type of Assessment: How will each outcome be measured?
4. Grading Criteria: What are the criteria to include that demonstrate
achievement of the stated desired learning outcomes?
5. Scoring/Grading Method: How will be the test/performance tasks be
scored?
6. Type of Score: What types of scores that are appropriate to indicate
the students’ level of achievement or performance?
D. Evaluate the sample grading and reporting scheme that you have developed for
each assessment by using the rubric.
Criteria Inadequate (1) Developing (2) Proficient (3)
Purpose of the test The purpose of The purpose of The purpose of
testing is not testing is specified; testing is clearly
specified in the however, it is not specified and
grading and reporting clear or relevant to relevant to the
system suggested for the grading and grading and reporting
the subject area reporting system system suggested for
covered. suggested for the the subject area
subject area covered. covered.
Identification of The intended learning The intended learning The intended learning
intended Learning outcomes in the outcomes are listed outcomes in the
Outcomes unit/topic/course are but they are not unit/topic/course are
not identified and clearly described. explicitly specified.
specified in the
grading and reporting
scheme.
Types of Tests The tests by which The tests are The tests will provide
students’ level of appropriate, but an adequate and
achievement are not they will not provide accurate measure of
valid and a complete and the extent to which
appropriate to valid measure of the learners have
measure extent to extent to which achieved the
which learners have learners have intended outcomes.
G. Evaluate your skills in identifying and using appropriate grading and reporting
techniques based on the following scale:
Level Performance Benchmark Using Different Using Different
Scoring Types of
Techniques Scores/Grades
Proficient I know them very well. I can 4 4
teach others where and when
to use them appropriately
Master I can do it by myself. Though, I 3 3
sometimes make mistakes.
Developing I am getting there. Though, I 2 2
still need help to be able to
perfect it.
Novice I cannot do it myself. I need 1 1
help to make an effective
grading and reporting scheme.
H. Based on your self-assessment above, choose the following tasks to help you
enhance your skills and competencies in developing different scoring and grading
techniques:
Level Possible Tasks
Proficient Help or mentor peers/classmates who are having difficulty in
understanding the different techniques in scoring and grading test
results.
Master Examine the areas that you need to improve on and address the
immediately.
Benchmarking with the scoring and grading scheme developed by
your peers/classmates who are known to be proficient in this area.
Developing/ Read more books/references about scoring techniques and types
Novice of scores. Ask your teacher to evaluate the grading and reporting
scheme that you have developed and to give suggestions on how
you can improve it.
Educator’s Input
References
David et al. (2020). Assessment in Learning 1. Manila: Rex Book Store.
De Guzman, E. and Adamos, J. (2015). Assessment of Learning 1. Quezon
City: Adriana Publishing Co., Inc.
D.O. No. 8, s. 2015 (Policy Guidelines on Classroom Assessment for the K to
12 Basic Education Program)
A trailblazer in arts, science and technology in the region. a. Enhance competency development, commitment, professionalism, unity and true spirit of
service for public accountability, transparency and delivery of quality services;
UNIVERSITY MISSION b. Provide relevant programs and professional trainings that will respond to the development
needs of the region;
The University shall primarily provide advanced instruction and professional training in c. Strengthen local and international collaborations and partnerships for borderless
science and technology, agriculture, fisheries, education and other related field of programs;
study. It shall undertake research and extension services, and provide progressive d. Develop a research culture among faculty and students;
leadership in its area of specialization. e. Develop and promote environmentally-sound and market-driven knowledge and
technologies at par with international standards;
UNIVERSITY GOAL f. Promote research-based information and technologies for sustainable development;
g. Enhance resource generation and mobilization to sustain financial viability of the
To produce graduates with excellence and dignity in arts, science and technology.
university.
h. Facilitate learning using a wide range of teaching methodologies and delivery modes appropriate to specific learners and their / / / / /
environments;
i. Develop innovative curricula, instructional plans, teaching approaches, and resources for diverse learners; / / / / / / /
j. Apply skills in the development and utilization of ICT to promote quality, relevant, and sustainable educational practices; / / / / /
k. Demonstrate a variety of thinking skills in planning, monitoring, assessing, and reporting learning processes and outcomes; / /
l. Practice professional and ethical teaching standards sensitive to the local, national, and global realities; / / / / /
m. Pursue lifelong learning for personal and professional growth through varied experiential and field-based opportunities; / / / /
n. Demonstrate in-depth understanding of the diversity of learners in various learning areas; / / / /
o. Manifest meaningful and comprehensive pedagogical content knowledge (PCK) of the different subject areas; / / / /
p. Utilize appropriate assessment and evaluation tools to measure learning outcomes; / / /
q. Manifest skills in communication, higher-order thinking skills, and use of tools and technology to accelerate learning and teaching; / / /
r. Demonstrate positive attitudes of a model teacher, both as an individual and as a professional; and / /
s. Manifest a desire to continuously pursue personal and professional development. / /
h. Enhance the quality of test through judgmental test-improvement and other empirically-based procedures / / / / / /
i. Ensure the validity and reliability of the constructed test / / / / / /
j. Organize the data derived from tests using tables and charts / / / / / / /
k. Use statistics to analyze, interpret and use test data in decision making / / / / / /
l. Observe the guidelines in test scoring and grading as well as its methods of reporting / / / / /
7. Course Contents
Course Objectives, Topics, Time Desired Student Learning Outcomes Outcomes-Based Assessment (OBA) Evidence of Course Program Values Integration
Allotment Activities Outcomes Learning Objectives
Outcomes
Lesson 0. Course Orientation (3 hours)
Course Syllabus 1. Explain the vision and mission, and Recite sincerely the University Oral Recitation a a, b Accountability,
Basic academic policies significant academic policies of the Vision and Mission (OR) Excellence
University Involvement in the G-class Class
2. Enumerate the course desired Participation
learning outcomes Rating (CPR)
3. Use the syllabus as reference for
independent learning
4. Simulate the computation of one’s
grades given the criteria
277
SULTAN KUDARAT STATE UNIVERSITY
Meaning of Learning assessment testing and grading practices of Scores (EQS) Transparency,
Evaluation and 2. Compare assessment with past teachers through a case Case Report Justice
Measurement measurement and evaluation presentation Rating (CRR)
Principles in Assessing 3. Discuss testing and grading Quiz Class
Learning 4. Explain the different principles in Self-assessment as contained in Participation
Grading and Testing assessing learning the last part of the module Rating (CPR)
5. Relate an experience as a student or Involvement in the G-class
pupil related to each principle
6. Comment on the tests administered
by the past teachers
7. Perform simple evaluation
Lesson 2. Assessment Purposes, Learning Objectives/Targets and Appropriate Methods (4.5 hours)
Purpose of Classroom 1. Articulate the purpose of classroom Completion of Table of Learning Exercises/Quiz d b, g, h, p, r Objectivity,
Assessment assessment Objectives/targets Scores (EQS) Justice,
Bloom’s Taxonomy of 2. Tell the difference between the Presentation of matrix of learning Case Report Truthfulness
Educational Objectives Bloom’s Taxonomy and the Revised targets and methods of assessment Rating (CRR)
Learning Objectives Bloom’s Taxonomy in stating Quiz Class
Learning Targets learning objectives Self-assessment as contained in Participation
Matching Appropriate 3. Apply the Revised Bloom’s the last part of the module Rating (CPR)
Assessment Methods Taxonomy in writing learning Involvement in the G-class
objectives
4. Discuss the importance of learning
targets in instruction
5. Formulate learning targets
6. Match the assessment methods with
specific learning objectives/targets
279
SULTAN KUDARAT STATE UNIVERSITY
improvement (Teacher’s judgmental item-improvement and Item Analysis Results (Difficulty Scores (EQS) r, t Objectivity
own Review, Peer Review, other empirically-based procedures Index and Index of Discrimination) Checklist Rating
Student Review) 2. Evaluate which type of test item- Quiz (CLR)
Other Empirically-based improvement is appropriate to use Oral Recitation Class
Procedures (Difficulty Index, 3. Compute and interpret the results for Involvement in the G-class Participation
Index of Discrimination, index of difficulty, index of Rating (CPR)
Distracter Analysis) discrimination and distracter
efficiency
4. Demonstrate knowledge on the
procedures for improving a
classroom-based assessment
distribution
7. Characterize a frequency distribution
graph in terms of skewness and
kurtosis
281
SULTAN KUDARAT STATE UNIVERSITY
8. Course Evaluation
Course Requirements The following are the course requirements: (a) Examinations (Midterm and Final); (b) Quizzes/Exercises; and, (c) Class Participation/involvement
Course Policies All students must adhere to these class guidelines: (a) act politely, responsibly and with maturity; (b) arrive on time and be ready for instruction; (c) set cell phones
in silent mode and keep them inside the bags; (d) contribute to an orderly learning environment; (e) consult the professor when deemed necessary; (f) establish
good rapport with professors; (g) maintain silence during oral reports/presentations; and, (h) cooperate in classroom activities or any task performances.
References
Book Andrade, H. (2010). Students as the definitive source of formative assessment: Academic self-assessment and the self-regulation of learning. In H. Andrade & G. Cizek (Eds.),
Handbook of formative assessment (pp. 90–105). New York, NY: Routledge.
Brookhart, Susan M. (2013). How to Create and Use Rubrics for formative assessment and grading. Virginia, USA: ASCD.
David el al. (2020). Assessment in Learning 1. Manila: Rex Book Store.
De Guzman, E. and Adamos, J. (2015). Assessment of Learning 1. Quezon City: Adriana Publishing Co., Inc.
Fives, H. & DiDonato-Barnes, N. (February 2013). Classroom Test Construction: The Power of a Table of Specifications. Practical Assessment, Research & Evaluation, Volume 18, (3).
Hattie, John. Visible Learning for Teachers: Maximizing Impact on Learning. New York: Routledge, 2012.
Klenowski, V. (1995). Student self-evaluation processes in student-centred teaching and learning contexts of Australia and England. Assessment in Education: Principles, Policy &
Practice, 2(2).
Macayan, J. (2017). Implementing Outcome-Based Education (OBE) Framework: Implications for Assessment of Students’ Performance. Educational Measurement and Evaluation
282
SULTAN KUDARAT STATE UNIVERSITY
Criteria Inadequate Developing but below Accomplished/ Meets Expectations Exemplary/Displays leadership Score
(0 point) expectations (2 points) (3 points)
(1 point)
Level of Engagement Student never contributes to class Few contributions to class Proactively contributes to class Proactively and regularly contributes to
283
SULTAN KUDARAT STATE UNIVERSITY
and active participation discussion; fails to respond to direct discussion; Seldom volunteers but discussion, asking questions and class discussion; Initiates discussion on
questions responds to direct questions respond to direct questions issues related to class topics
Listening Skills Does not listen when others talk, Does not listen carefully and Listens and appropriately responds Listens without interrupting and
interrupts, or makes inappropriate comments are often non- to the contributions of others incorporates and expands on the
comments responsive to discussion contributions of other students
Relevance of Contributions , when made, are off- Contributions are sometimes off- Contributions are always relevant Contributions are relevant and promote
Contribution to topic topic or distract class from topic or distracting deeper analysis of the topic
under discussion discussion
Preparation Student is not adequately prepared; Student has read the material but Student has read and thought Student is consistently well prepared;
Does not appear to have read the not closely or has read only some about the material in advance of Frequently raises questions or
material in advance of class of the assigned material in class; comments on material outside
advance of class
Each item is rated on the following rubric. [1= Very poor; 2 = Poor; 3 = Adequate; 4 = Good; 5 = Excellent]
Item Scores
1. Evidence of preparation (organized presentation, presentation/discussion flows well, no awkward pauses or confusion from the group/individual, 1 2 3 4 5
evidence you did your homework)
2. Content (group/individual presented accurate & relevant information, appeared knowledgeable about the case studies assigned and the topic 1 2 3 4 5
discussed, offered strategies for dealing with the problems identified in the case studies)
3. Enthusiasm/Audience Awareness (demonstrates strong enthusiasm about topic during entire presentation; significantly increases audience 1 2 3 4 5
understanding and knowledge of topic; convinces an audience to recognize the validity and importance of the subject)
4. Delivery (clear and logical organization, effective introduction and conclusion, creativity, transition between speakers, oral communication skills - eye 1 2 3 4 5
contact)
5. Discussion (group/individual initiates and maintains class discussion concerning assigned case studies, use of visual aids, good use of time, 1 2 3 4 5
involves classmates)
285