Timon, Vince Florbe B.

EDU 533 (A-COE- 12)


X=test scores (write all scores)
f=frequency (count the number of students with the same X)
Percent= divide the frequency by number of examinees multiplied by 100 (F/100*100)
cumulative percent= add % to frequency of the next score
Conventions in presenting test data grouped in frequency:
1. As much as possible, the size of the class intervals should be equal. Class intervals that are
multiples of 5, 10, 100 etc are often desirable. At times, when large gaps exist in the data and
unequal class intervals are used, such intervals cause inconvenience in the preparation of
graphs and computation of certain descriptive statistical measures. Use this formula in
estimating necessary class intervals:
The conventional number of classes to group the date generally varies from 7-20. As seen on
table 3, the size of the class interval is 5 which is an odd number. If you look at the midpoints,
these are all whole numbers. If class size is an even number, then the midpoints will contain
decimal numbers which may add some difficulties in conventional computations for some
important measures.
2. Start the class interval at a value which is a multiple of the class width. In table 3, you used
the class interval of 5 such that you start with the class value of 20, which is a multiple of 5 and
where 20-24 includes the lowest test score of seen on table 1.
3. As much as possible, open-ended class intervals should be avoided (ex. 100 and below or
100 and above) These will cause some problems in graphing and computation of descriptive
statistical measures

The most common type of graph used to evaluate behavioral data is the line graph. A line
graph shows individual data points connected by line, creating a path. Over time, this path can
show a visual pattern that helps you evaluate the overall directions of a behavior.
Another common graph used is referred to as a bar graph. A bar graph is often used when
portions of a whole are being represented or when reporting a percentage. The bar graph
focuses on the height of the data rather than the trend in the data, and is most often used when
non-consecutive data points are being evaluated. This is a particularly useful method when
comparing information across individuals, settings, or situations.
 Histogram is a type of graph appropriate for quantitative data such as test scores. This
graph consists of column-each has a base that represents one class interval, and its
height represents the number of observations or simply the frequency in the class
 Frequency polygon is also used for quantitative data, and it is one of the most
commonly used methods in presenting test scores. It is the line graph of a frequency
polygon. Very similar with histogram but instead of bars it uses lines to compare sets of
test data in the same axes.
 Cumulative frequency polygon is quite different from a frequency polygon because
cumulative frequencies are plotted. In addition, you plot a point above the exact limits of
the interval. As such, a cumulative polygon gives a picture of the number of observations
that fall below a certain score instead of the frequency within a class interval.
 Pie graph may be useful when representing portions of a whole. For instance, it might
be helpful to create a pie chart indicating the amount of time a student spends actively
engaged in activities.
 Skewness is the degree of asymmetry of a graph. Basic principle of a coordinate
system tells you that as you move toward the right of the x-axis, the numerical value
increases. Likewise, as you move up the y-axis, the scale value becomes higher. Thus,
in a negatively skewed distribution, there are more who get higher scores and the tail,
indicating lower frequencies of distribution points to the left or to the lower scores, On
the other hand, in positively skewed distribution, lower scores are clustered on the left
side. This means that there are more who get lower scores and the tail indicates the
lower frequencies are on the right or the higher scores.
 Kurtosis is a statistical measure used to describe the degree to which scores cluster in
the tails or the peak of a frequency distribution. The peak is the tallest part of the
distribution, and the tails are the ends of the distribution.

 MESOKURTIC- Distributions that are moderate in breadth and curves with a medium
peaked height. When kurtosis is equal to 0, the distribution is mesokurtic.
 LEPTOKURTIC- Leptokurtic distributions have positive kurtosis values. A leptokurtic
distribution has a higher peak and taller (i.e. fatter and heavy) tails than a normal
 PLATYKURTIC- Fewer values in the tails and fewer values close to the mean (i.e. the
curve has a flat peak and has more dispersed scores with lighter tails).Negative values
of kurtosis indicate that a distribution is flat and has thin tails.

Measures of Central Tendency provide a summary measure that attempts to describe a whole
set of data with a single value that represents the middle or center of its distribution. There are
three main measures of central tendency: the mean, the median and the mode.
When data is normally distributed, the mean, median and mode should be identical, and are all
effective in showing the most typical value of a data set. It's important to look at the dispersion
of a data set when interpreting the measures of central tendency.
 The mean of a data set is also known as the average value. It is calculated by dividing
the sum of all values in a data set by the number of values.
 The median of a data set is the value that is at the middle of a data set arranged from
smallest to largest. The median is appropriate to use with ordinal variables, and with
interval variables with a skewed distribution.
 The mode is the most common observation of a data set, or the value in the data set
that occurs most frequently. The mode is an appropriate measure to use with categorical
data. The mode has several disadvantages. It is possible for two modes to appear in the
one data set (e.g. in: 1, 2, 2, 3, 4, 5, 5, both 2 and 5 are the modes).
 Range is the difference between the highest (XH) and the lowest (XL) scores in a
distribution. The simplest measure of variability but also considered as the least accurate
measure of dispersion because its value is determined by just two scores in a group. It
does not take into consideration the spread of all scores; its value simply depends on the
highest and lowest scores. Its value could be drastically changed by a single value.
 Standard deviation is the most widely used measure of variability and is considered as
the most accurate to represent the deviations of individual scores from the mean values
in the distribution. Standard deviation in R is a statistic that measures the amount of
dispersion or variation of a set of value, generally, it is used when we are dealing with
values where we have to find the difference between the values and the mean.

 Quartile. In measures of central tendency, you learned that the median of a distribution
divides the date into two equal groups. In a similar way, the quartiles are the three
values that divide a set of scores into four equal parts, with one-fourth of the data values
in each part. This means about 25% of the data falls at or below the first quartile (Q1);
50% of the data falls at or below the 2nd quartile (Q2), and 75% falls at or below the 3rd
quartile (Q3) Notice that Q2 is also the median. We can say that Q1 is the median of the
first half of the values and Q3 the median of the second half of the values. Thus, the
upper quartile represents on average the mark of the top half of the class, while the
lower quartile represents that bottom half of the class.
 Decile. It divides the distribution into 10 equal parts. There are 9 deciles such that 10%
of the distribution are equal or less than decile 1. 20% of the scores are equal or less
than decile 2 and so on. As student whose mark is below the first decile is said to belong
to decile 1. A student whose mark is between the first and second deciles is in decile 2,
and one whose mark is above the ninth decile belongs to decile 10. If there are a small
number of data values, decile is not appropriate.
 Percentiles indicate the percentage of scores that fall below a particular value. They tell
you where a score stands relative to other scores. For example, a person with an IQ of
120 is at the 91st percentile, which indicates that their IQ is higher than 91 percent of
other scores. Percentiles are a great tool to use when you need to know the relative
standing of a value.
 The normal distribution is the most important probability distribution in statistics
because it fits many natural phenomena. The normal distribution is a probability function
that describes how the values of a variable are distributed. It is a symmetric distribution
where most of the observations cluster around the central peak and the probabilities for
values further away from the mean taper off equally in both directions. Extreme values in
both tails of the distribution are similarly unlikely.
 Measures of Covariability tell you to a certain extent a relationship between two tests
or two factors. Admittedly, a score one gets may not only be due to a single factor but
with other factors directly or indirectly observable, which are also related to one another.

Grades are alphabetical or numerical symbols/ marks that indicate the degree to which learners
are able to achieve their learning objectives. They are part of the instructional process and
serve as feedback on what specific topic/s learners have mastered and what they need to focus
more when they need to prepare for summative assessments. Sometimes, grades may serve as
motivators to some learners to maintain or improve their performance. They give parents
information about their children’s achievements. They are also useful for administrators who
want to evaluate the effectiveness of the instructional programs in developing the needed skills
and competencies of the learners.
Traditional Methods in scoring performance tasks
1. Number right scoring (NR) entails assigning positive values only to correct answers while
giving a score of zero to incorrect answers. The test score is the sum of the scores for correct
responses. One major concern with this scoring method is that learners may get the correct
answer by guessing; affecting the test reliability and validity.
2. Negative marking (NM) entails assigning positive values to correct answers while punishing
the learners for incorrect responses (right minus wrong method) In this model, a fraction of the
number of wrong answers is subtracted from the number of correct answers. Other models for
this type of scoring includes: a. giving positive score to correct answer while assigning no mark
for omitted items b. rewarding learners for not guessing by awarding point rather than penalizing
for incorrect answers 1/(n − 1) where n stands for the number of choices.
Non Conventional Methods in scoring performances
1. Partial Credit scoring methods attempt to determine a learner’s degree of level of
knowledge with respect to each response option given. This method of scoring takes into
account partial knowledge mastery of learners. It acknowledges that, while others cannot
always recognize the correct answer, they can discern that some response options are clearly
a. Liberal Choice test- allows learners to select more than one answer to a question if they feel
uncertain which option or alternative is correct
b. Elimination testing (ET) - instructs learners to cross out all alternatives they consider to be
c. Confidence Weighing (CW)- asks learners to indicate what they believe is the correct
answer and how confident they are about their choice.
2. Retrospective Correcting for Guessing considers omitted or no-answer items as incorrect,
forcing learners to give an answer for every item even if they do not know the answer. The
correction for guessing is implemented later or retroactively. This can be done through
comparing learner’s answers in multiple-choice items with their answer on the other test formats
such as short-answer test.
3. Standard-setting entails using standards when scoring multiple-choice items particularly
standards set through norm-referenced or criterion-referenced assessments. Standards based
on norm-referenced assessments are derived from the test performance of a certain group of
learners, while standards from criterion-referenced assessment are based on preset standards
specified from the very start by the teacher or school in general.
4. Holistic Scoring involves giving a single, overall assessment score for an essay, writing
composition, or other performance-type assessment as a whole. Although the scoring rubric for
holistic scoring lays out specific criteria for evaluating a task, raters do not assign a score for
each criterion. Instead, as they read a writing task or observe a performance task, they balance
strengths and weaknesses among the various criteria to arrive at an overall assessment.
Holistic scoring is considered efficient in terms of time and cost. It also does not penalize poor
performance based on only one aspect (eg. content, delivery, organization) However, it is said
that holistic scoring does not provide sufficient diagnostic information about the students’ ability
as it does not identify the areas for improvement and is difficult to interpret as it does not detail
the basis for evaluation.
5. Analytic Scoring involves assessing each aspect of a performance task and assigning a
score for each criterion. Sometimes, an overall score is given by averaging the scores in all
criteria. One advantage of analytic scoring is its reliability. It also provides information that can
be used as diagnostic as it presents learners’ strengths and weaknesses and in what area/s and
eventually as basis for remedial instructions. However it is more time consuming and therefore
expensive. It is also prone to halo effect, wherein scores in one scale may influence the ratings
of the others. It is also difficult to create.
6. Primary Trait Scoring focuses on only one aspect or criterion of a task, and a learner’s
performance is evaluated based on a trait. This scoring system defines a primary trait in the task
that will then be scored. For example if a teacher in a political science class asks his students to
write an essay on the advantages and disadvantages of Martial Law, the basic question
addressed in scoring is, “Did the writer successfully accomplish the purpose of this task?” With
this focus, the teacher would ignore errors in conventions of written language but instead focus
on the overall rhetorical effectiveness. One disadvantage on this scoring scheme is that it is
often difficult to focus exclusively on one trait, such that other traits may be included when
scoring. Thus, it is important that a very detailed scoring guide is used for each specific task.
7. Multi-trait scoring requires that an essay test or performance task is scored on more than
one aspect, with scoring criteria in place so that they are consistent with the prompt. Multiple-
trait scoring is task-specific, and the features to be scored vary from task to task; thus requiring
separate scores for different criteria. Multiple-trait scoring is similar to analytic scoring because
of its focus on several categories of criteria. However, while analytical scoring evaluates more
traditional and generic dimensions of language production, multiple-trait scoring focuses on
specific features of performance required to fulfill the given task or tasks. For example in a PE
class, basketball, one may be scored based on different skills such as dribbling, passing,
rebound, blocking, stealing, etc.
Types of Test Scores
1. Raw Score is simply the number of items answered correctly in a test. A raw score provides
an indication of the variability in the performance of students in the class. However, a raw score
hasno meaning unless you know what the test is measuring and how many items it contains. A
raw score also does not mean much because it cannot be compared with a standard or with the
performance of another learner or of the class as a whole.
2. Percentage Score refers to the percent of items answered correctly in a test. The number of
items answered correctly is typically converted to percent based on the total possible score. The
percentage score is interpreted as the percent content, skills or knowledge that the learner has
a solid grasp of. Just like raw score, percentage score has limitations because there is no way
of comparing the percentage correct obtained in a test with the percentage correct in another
test with a different difficulty level. Percentage score is most appropriate to use in a teacher-
made test or criterion-referenced test. Percentage score is appropriate to use in a teacher-made
test that is administered commonly to a class or to students taking the same course with the
same course syllabus. In this way, the students’ test performances can be compared among
each other in the class or with their peers in another section. In the same manner, percentage
score is suitable to use in subjects where a standard score has been set.
3. Criterion-referenced grading system is a grading system wherein learner’s test scores or
achievement levea.
a. Pass or fail grade is most appropriate if the test or assessment is primarily or entirely to
make a pass or fail decision. In this type of scoring, a standard or cutoff score is preset, and a
learner is given a score of pass if he or she surpassed the expected level of performance or
cutoff score. This is most appropriate for comprehensive or licensure exams because there is
no limit to the number of examinees who can pass or fail. Each individual examinee’s
performance is compared to an absolute standard and not to the performance of others.
b. Letter grade is one of the most commonly used grading systems. Letter grades are usually
composed of a five-level grading scale labeled from A to E or F with A representing the highest
level of are based on their performance in specific learning goals and outcomes
for standards.
c. plus (+) and minus (-) letter grades provide a more detailed descriptions of the level of
learners’ achievement or task performance by dividing each grade category into three levels
such that a grade of A can be assigned as A+ or A- and so on.
d. Categorical Grades is generally more descriptive than letter grades, especially if couples
with verbal labels. Verbal labels eliminate the need for a key or legend to explain what each
grade category means
4. Norm-referenced grading system compared learners’ test scores with their peers’ test
scores. This involves ranking to express the learner’s score in relation to the achievement of the
a. Developmental Score are scores transformed from raw scores and reflect the average
performance at age and grade levels
I. grade-equivalent score is described as both a growth score and status score. The grade
equivalent score of a given raw score in any test indicates the grade level atwhich the typical
learner earns his raw score. A decimal point is used between a trade and month in grade
equivalence. Ex. a score of 6.5 means that the learner did as well as a grade 6 taking the test at
the end of the fifth month of the school year
II. age-equivalent score indicated the age level that is typical to a learner to obtain such raw
score, It reflects a learner’s performance in terms of the chronological age as compared to those
in the norm group. These scores are written with a hyphen between the years and months. If a
learners score is 12-3, his age equivalence is 12 years and 3 months old, indicating a test
performance that is similar to that of a 12.3 year-olds in a group.
b. Percentile rank indicates the percentage of scores that fall at or below a given score.
Percentile ranks range from 1-99. For example, if a student obtained a score of 85th percentile
rank in a standardized achievement test, it means that the learner was able to get a higher
score than 85% of the learners in the norm group.
c. Stanine Score expresses test results in nine equal steps which range from one (lowest) to
nine (highest) A stanine score of 5 is interpreted as average stanine. Percentile ranks are
grouped into stanines with the following interpretations”:
d. Standard Scores discussed in the previous modules
 Z-score
 T-score
Guideline in grading tests/ performance tasks
1. Stick to the purpose of the assessment.
2. Be guided by the desired learning outcomes
3. Develop grating criteria
4. Inform learners what scoring methods are to be used
5. Decide on what type of test scores to use
Guidelines in grading essay tests
1. Identify the criteria for writing essay
2. Determine the type of rubric to be used
3. Prepare the rubric
4. Evaluate the essay anonymously
5. Score one essay at a time
6. Be conscious of your own biases when evaluating paper
7. Review initial scores and comments before giving the final rating
8. Get two or more raters
9. Write comments (feedback)

The K to 12 Basic Education Program uses a standards- and competency-based grading
system. These are found in the curriculum guides. All grades will be based on the weighted raw
score of the learners’ summative assessments. The minimum grade needed to pass a specific
learning area is 60, which is transmuted to 75 in the report card. The lowest mark that can
appear on the report card is 60 for Quarterly Grades and Final Grades. For these guidelines, the
Department will use a floor grade considered as the lowest possible grade that will appear in a
learner’s report card. Learners from Grades 1 to 12 are graded on Written Work, Performance
Tasks, and Quarterly Assessment every quarter. These three are given specific percentage
weights that vary according to the nature of the learning area.
Guidelines specific to the assessment of Kindergarten learners will be issued in a different
memorandum or order. However, for Kindergarten, checklists and anecdotal records are used
instead of numerical grades. These are based on learning standards found in the Kindergarten
curriculum guide. It is important for teachers to keep a portfolio, which is a record or compilation
of the learner’s output, such as writing samples, accomplished activity sheets, and artwork. The
portfolio can provide concrete evidence of how much or how well the learner is able to
accomplish the skills and competencies. Through checklists, the teacher will be able to indicate
whether or not the child is able to demonstrate knowledge and/or perform the tasks expected of
Kindergarten learners. Through anecdotal records or narrative reports, teachers will be able to
describe learners’ behavior, attitude, and effort in school work.
For MAPEH, individual grades are given to each area (IE, MUsic, Art, PE and Health) The
quarterly grade for MAPEH is the average grade across the four areas:
QG for MAPEH= (QG for Music + QG for Arts + Quarter Grade for PE + Quarter Grade for
Health) ÷ 4
The final grade for each subject is then computed by getting the average of the four quarterly
grades, as seen below:
Final Grade for each learning area= (1QG + 2QG + 3QG + 4QG) ÷ 4
The General Grade on the other hand, is computed by getting the average of the Final grades
for all subjectareas. Each subject area has equal weight.
General Average= sum of all learning areas ÷ total number of learning areas in a grade
All grades reflected on the report card are reported as a whole number.

In education, the term stakeholder typically refers to anyone who is invested in the welfare and
success of a school and its students, including administrators, teachers, staff members,
students, parents, families, community members, local business leaders, and elected officials
such as school board members, city councilors, and state representatives. Stakeholders may
also be collective entities, such as local businesses, organizations, advocacy groups,
committees, media outlets, and cultural institutions, in addition to organizations that represent
specific groups, such as teachers unions, parent-teacher organizations, and associations
representing superintendents, principals, school boards, or teachers in specific academic
The idea of a “stakeholder” intersects with many school-reform concepts and strategies—such
as leadership teams, shared leadership, and voice—that generally seek to expand the
number of people involved in making important decisions related to a school’s organization,
operation, and academics. For example, shared leadership entails the creation of leadership
roles and decision-making opportunities for teachers, staff members, students, parents, and
community members, while voice refers to the degree to which schools include and act upon the
values, opinions, beliefs, perspectives, and cultural backgrounds of the people in their
community. Stakeholders may participate on a leadership team, take on leadership
responsibilities in a school, or give “voice” to their ideas, perspectives, and opinions during
community forums or school-board meetings, for example.

