Module Iii Educ 105

1
MODULE III
Lesson 1 TYPES OF PAPER AND

PENCIL TEST
Lesson 2 ITEM ANALYSIS:

DIFFICULTY INDEX AND
DISCRIMINATION INDEX
Lesson 3 VALIDATION, VALIDITY

AND RELIABILTY
Lesson 4 MEASURES OF CENTRAL

TENDENCY
EDUC 105 ASSESSMENT OF LEARNING

2
Now that you are acquainted with knowledge about evaluation,

assessment and learning objectives, we will now proceed with types of paper
and pencil test and its construction. You will also learn how to determine the
validity and reliability of given test items.
After studying the module, you should be able to:
1. construct paper and pencil test in accordance with the guidelines in test
construction;
2. choose a test suited to the topics that was discussed within a grading
period/semester;
3. use Bloom’s taxonomy as their guide with developing their test
4. explain the meaning of item analysis, item validity, reliability, item
difficulty and discrimination index;
5. determine the validity and reliability of given test items.
There are four lessons in the module. Read each lesson carefully then
answer the exercises/activities to find out how much you have benefited from it.
Work on these exercises carefully and submit your output on time
In case you encounter difficulty, discuss this with your teacher during virtual
meeting.
God bless and happy reading!!!

3
Lesson 1
TYPES OF PAPER AND PENCIL

TEST
Development of paper-and-pencil tests requires careful planning and expertise

in terms of actual test construction. The more seasoned teachers can produce true-
false items that can test even higher order thinking skills and not just rote memory
learning. Essays are easier to construct than the other types of objective test, but the
difficulty in scoring essay examinations teachers from using this particular form of
examination in actual practice.
CONSTRUCTING SELECTED-RESPONSE TYPE
 TRUE-FALSE TEST
Binomial-choice or alternate response tests are tests that have only two options
such as true or false, right or wrong, yes or no good or better, check or cross out
and so on. A student who knows nothing of the content of the examination would
have 50% chance of getting the correct answer by sheer guess work. Although
correction-for-guessing formulas exist, it is best that the teacher ensures that a
true-false item is able to discriminate properly between those who know and
those who are just guessing. A modified true-false test can offset the effect of
guessing by requiring students to explain their answer and to disregard a correct
answer if the explanation is incorrect. Here are some rules of thumb in
constructing true-false items.

4
Guidelines for Constructing Alternate-Response Test

Rule 1. Do not give a hint (inadvertently) in the body of the question.
Example: The Philippines gained its independence in 1898 and therefore
celebrated its centennial year in 2000 _________.
Obviously, the answer is FALSE because 100 years from 1898 is not 2000
but 1998.
Rule 2. Avoid using the words "always, " "never," "often" and other words that
tend to be either always true or always false.
Example: Christmas always falls on a Sunday because it is a Sabbath day.
________.
Statements that use the word "always" are almost always false. A test-
wise student can easily guess his way through a test like these and get high scores
even if he does not know anything about the test.
Rule 3. Avoid long sentences as these tend to be "true." Keep sentences short.
Example: Tests need to be valid, reliable and useful, although, it would require
a great amount of time and effort to ensure that tests possess these test
characteristics. ___________.
Notice that the statement is true. However, we are also not sure which
part of the sentence is deemed true by the student. It is just fortunate that in
this case, all parts of the sentence are true and, hence, the entire sentence is
true. The following example illustrates what can go wrong in long sentences:
Example: Tests need to be valid, reliable and useful since it takes very little
amount of time, money and effort to construct tests with these characteristics.
_____________.
The first part of the sentence is true but the second part is debatable and
may, in fact, be false. Thus, a “true” response is correct and, also, a “fake”
response is correct.
Rule 4. Avoid trick statements with some minor misleading word or spelling
anomaly, misplaced phrases, etc. A wise student who does not know the subject
matter may detect this strategy and thus get the answer correctly.
The Raven was written by Edgar Allen Poe.
Allen is misspelled and the answer would be false!
This is an example of a tricky but utterly useless item.
Rule 5. Avoid quoting verbatim from reference materials or textbooks. This
practice sends the wrong signal to the students that it is necessary to memorize

5
the textbook word for word and, thus, acquisition of higher level thinking skills
is not given due importance.
Rule 6. Avoid specific determiners or give-away qualifiers. Students quickly learn
that strongly worded statements are more likely to be false than true, for
example, statements with "never" "no" "all" or "always." Moderately worded
statements are more likely to be true than false. Statements that are moderately
worded use "many" "often" "sometimes" "generally" "frequently" or "some" usually
should be avoided. e.g. Executives usually suffer from hyperacidity. The
statement tends to be correct. The word "usually " leads to the answer.
Rule 7. With true or false questions, avoid a grossly disproportionate number of
either true or false statements or even patterns in the occurrence of true and
false statements.
1. T 6. F 1. T 6. F
2. F 7. F or 2. F 7. T
3. F 8. F 3. T 8. F
4. F 9. F 4. F 9. T
5. F 10. F 5. T 10. F
For ease of correction, teachers sometime create a pattern of True or

False answers. Students will sense it and may arrive at a correct answer not
because he/she really knows the answer but because he/she senses the pattern.
Rule 8. Avoid double negatives. This makes test item unclear and definitely will
confuse the student. e.g. The changes that take place in early childhood are NOT
Unchangeable. The test item simply means "The changes in early childhood are
changeable"
 MULTIPLE CHOICE TEST
The multiple choice type of test offers the student with more than two (2)
options per item to choose from. Each item in a multiple choice test consists of
two parts: (a) the stem and (b) the options. In the set of options, there is a
"correct" or "best" option while all the others are considered "distracters." The
distracters are chosen in such a way that they are attractive to those who do not
know the answer or who are guessing but at the same time, have no appeal to
those who actually know the answer. It is this feature of multiple choice type
tests that allows the teacher to test higher order thinking skills even if the
options are clearly stated. As in true-false items, there are certain rules of thumb
to be followed in constructing multiple choice tests.
GUIDELINES FOR CONSTRUCTING MULTIPLE CHOICE ITEMS

6
1) Do not use unfamiliar words, terms and phrases. The ability of the item to
discriminate or its level of difficulty should stem from the subject matter rather
than from the wording of the question.
Example: What would be the system reliability of a computer system whose slave
and peripherals are connected in parallel circuits and each one has a known time
to failure probability of 0.05?
A student completely unfamiliar with the terms "slave" and "peripherals
“may not be able to answer correctly even if he knew the subject matter of
reliability.
2) Do not use modifiers that are vague and whose meanings can differ from one
person to the next such as: much, often, usually, etc.
Example:
Much of the process of photosynthesis takes place in the:
a. Bark
b. Leaf
c. Stem
The qualifier "much" is vague and could have been replaced by more
specific qualifiers like: " 90% of the photosynthetic process" or some similar
phrase that would be more precise. Be quantitative.
3) Avoid complex or awkward word arrangements. Also, avoid use of negatives in

the stem as this may add unnecessary comprehension difficulties.
Example:
(Poor) As President of the Republic of the Philippines. Corazon Cojuangco
Aquino would stand next to which President of the Philippine Republic
subsequent to the 1986 EDSA Revolution?
(Better) Who was the President of the Philippines after Corazon C. Aquino?
4) Do not use negatives or double negatives as such statements tend to be

confusing. It is best to use simpler sentences rather than sentences that would
require expertise in grammatical construction.
Example:
(Poor) Which of the following will not cause inflation in the
Philippine economy?
(Better) Which of the following will cause inflation in the
Philippine economy?

7
Poor: What does the statement "Development patterns

acquired during the formative years are NOT Unchangeable" imply?
A.
B.
C.
D.
Better: What does the statement "Development patterns

acquired during the formative years are changeable" imply?
A.
B.
C.
D.
5) Each item stem should be as short as possible; otherwise you risk testing more
for reading and comprehension skills.
6) Distracters should be equally plausible and attractive.
Example:
The short story: May Day's Eve, was written by which Filipino author?
a. Jose Garcia Villla
b. Nick Joaquin
c. Genoveva Edrosa Matute
d. Robert Frost
e. Edgar Allan Poe
If distracters had all been Filipino authors, the value of the item would be
greatly increased. In his particular instance, only the first three carry the burden
of the entire item since the last two can be essentially disregarded by the
students.
7) All multiple choice options should be grammatically consistent with the stem.
Example:
As compared to the autos of the 1960s autos in the 1980s _________.
A. Traveling slower C. to use less fuel
B. Bigger Interiors D. contain more safety measures
Option A, B and C are obviously wrong for the language smart because
when added to the stem the sentence is grammatically wrong. D is the
only option which when connected to the stem retains the grammatical
accuracy of the sentence, thus obviously is the correct answer.

8
8) The length, explicitness, or degree of technicality of alternatives should not be

the determinants of the correctness of the answer. The following is an example
of this rule:
Example:
If the three angles of two triangles are congruent, then the triangles are:
a. congruent whenever one of the sides of the triangles are
congruent
b. similar
c. equiangular and/therefore, must also be congruent
d. equilateral if they are equiangular
The correct choice, "b," may be obvious from its length and explicitness
alone. The other choices are long and tend to explain why they must be
the correct choices forcing the students to think that they are, in fact,
not the correct answers!
9) Avoid stems that reveal the answer to another item.
Example:
a. Who will most strongly disagree with the progressivist who claims that
the child should be taught only that which interests him and if he is
not interested, wait till the child gets interested?
A. Essentialist C. Progressivist
B. Empiricist D. Rationalist
b. Which group will most strongly focus its teaching on the interest of the
child?
A. Progressivist C. Perrenialist
B. Essentialist D. Reconstructionist
One may arrive at a correct answer (letter b) by looking at item “a.” that
gives the answer to “b.”
10) Use the "None of the above" option only when the keyed answer is totally
correct. When choice of the "best" response is intended, "none of the above is
not appropriate, since the implication has already been made that the correct
response may be partially inaccurate.
11) Note that use of all of the above" may allow credit for partial knowledge. In a
multiple option item, (allowing only one option choice) if a student only knew
that two (2) options were correct, he could then deduce the correctness of "all
of the above." This assumes you are allowed only one correct choice.

9
12) Better still use "none of the above" and "all of the above" sparingly but best not
to use them at all.
 MATCHING TYPE TEST

The matching type items may be considered modified multiple choice type items
where the choices progressively reduce as one successfully matches the items on
the left with the items on the right.
Guidelines for Constructing Matching Type Test
Here are some guidelines to observe in the formulation of good matching type of
test.
1. Match homogeneous not heterogeneous items. The items to match must be
homogeneous. If you want your students to match authors with their literary
works, in one column will be authors and in the second column must be literary
works. Don't insert nationality for instance with names of authors. That will not
be a good item since it is obviously wrong.
Example of homogeneous items. The items are all about the Filipino heroes, nothing
more.
Match the items in Column A with the items in Column B.
Perfect Matching Type

A. B.
___ 1. First President of the Republic a. Magellan
___ 2. National Hero b. Mabini
___ 3. Discovered the Philippines c. Rizal
___ 4. Bran of Katipunan d. Lapu-Lapu
___ 5. The great painter e. Aguinaldo
___ 6. Defended Limasawa island f. Juan Luna
g. Antonio Luna
2. The stem (longer in construction than the options) must be in the first column
while the options (usually shorter) must be in the second column.
3. The options must be more in number than the stems to prevent the student from
arriving at the answer by mere process of elimination.

10
4. To help the examinee find the answer easier, arrange the options alphabetically
or chronologically, whichever is applicable.
5. Like any other test, the direction of the test must be given. The examinees must
know exactly what to do.
 CONSTRUCTING SUPPLY TYPE OR CONSTRUCTED RESPONSE TYPE
Another useful device for testing lower order thinking skills is the supply type of
tests. Like the multiple choice test, the items in this kind of test consist of a
stem and a blank where the students would write the correct answer.
Example: The study of life and living organisms is called ___________.
Supply type tests depend heavily on the way the stems are constructed. These
tests allow for one answer only and hence, often test only the students’ recall
of knowledge.
It is, however, possible to construct supply type of tests that will test higher order
thinking as the following example shows:
Example: Write an appropriate synonym for each of the following. Each blank
corresponds to a letter:
Metamorphose: _ _ _ _ _ _
Flourish: _ _ _ _
The appropriate synonym for the first is CHANGE with six (6) letters while the
appropriate synonym for the second is GROW with four (4) letters. Notice that these
questions require not only mere recall of words but also understanding of these words.
Another example of a completion type of text that measures higher order - thinking
skill is given below:
Example: Example; Write G if the item on the left is greater than the item on the right;
L if the item on the left is less than the item on the right; E if the item on the left
equals the item on the right and D if the relationship cannot be determined.
A B
1. Square root of 9 ______________ a. -3
2. Square of 25 ______________ b. 615
3. 36 inches ______________ c. 3 meters
4. 4 feet ______________ d. 48 inches

11
5. 1 kilogram ______________ e. 1 pound
Guidelines for the Formulation of a Completion Type of Test

The following guidelines can help you formulate a completion type of test, the fill-in-
the blank type.
1. Avoid over mutilated sentences like this test item. Give enough clue to the student.
The _____ produced by the _____ is used by the green _____ to change the _____ and
_____ into _____. This process is called _____.
2. Avoid open-ended item. There should be only one acceptable answer. This item is
open-ended, hence no good test item.
Ernest Hemingway wrote ________.
3. The blank should be at the end or near the end of the sentence. The question must
first be asked before an answer is expected. Like the matching type of test, the
stem (where the question is packed) must be in the first column.
Essays
Essays, classified as non-objective tests, allow for the assessment of higher order
thinking skills. Such tests require students to organize their thoughts on a subject
matter in coherent sentences in order to inform an audience. In essay tests, students
are required to write one or more paragraphs on a specific topic.
Essay questions can be used to measure attainment of a variety of objectives.
1. Comparing
- Describe the similarities and differences between …
- Compare the following methods for ...
2. Relating cause-and-effect
- What are the major causes of ....
- What would be the most likely effects of ...
3. Justifying
- Which of the following alternatives would you favor and why?
- Explain why you agree or disagree with the following statement.
4. Summarizing
- State the points included in ...

12
- Briefly summarize the contents of ...

5. Generalizing
- Formulate several valid generalizations from the following data.
- State a set of principles that can explain the following events.
6. Inferring
- In the light of the facts presented, what is most likely to happen when...
- How would Senator X be most likely to react to the bomb explosion after the bar
examination last September?
7. Classifying
- Group the following items according to ...
- What do the following items have in common?
8. Applying
- Using the principles of as guide, describe how you would solve the following
problem situation.
- Describe a situation that illustrates the principle of ______.
9. Analyzing
- Describe the reasoning errors in the following paragraphs.
- List and describe the main characteristics of ...
10. Evaluating
- Describe the strengths and weaknesses of the following ...
- Using the criteria developed in class, write an evaluation of ...
11. Creating
- Make up a story describing what would happen if ...
- Design a plan to prove that ...
- Write a well-organized report that shows ...

13
 TYPES OF ESSAYS
Restricted Essays
It is also referred to as short focused response. Examples are asking students to
"write an example," "list three reasons" or "compare and contrast two
techniques."
Non-restricted / Extended Essay

Extended responses can be much longer and complex than short responses, but students
are encouraged to remain focused and organized. Note that all these involve the higher-
level skills mentioned in Bloom’s Taxonomy.
Guidelines for the Formulation and Scoring of Essay Tests

Rule 1: Phrase the direction in such a way that students are guided on the key concepts
to be included. Specify how the students should respond.
Example:
Using details and information from the article (Hundred Islands), summarize the
main points of the article. For a complete and correct response, consider these points:
- Its history (10 pts)
- Its interesting features (10pts)
- Why it is a landmark (5 pts)
Rule 2: Inform the students on the criteria to be used for grading their essays. This rule
allows the students to focus on relevant and substantive materials rather than on
peripheral and unnecessary facts and bits of information.
Example: Write an essay on the topic: "Plant Photosynthesis" using the keywords
indicated. You will be graded according to the following criteria: (a) coherence, (b)
accuracy of statements, (c) use of keywords, (d) clarity and (e) extra points for
innovative presentation of ideas.
Rule 3: Put a time limit on the essay test.
Rule 4: Decide on your essay grading system prior to getting the essays of your students.
Rule 5: Evaluate all of the students' answers to one question before proceeding to the
next question.
Scoring or grading essay tests question by question, rather than student by student,
makes it possible to maintain a more uniform standard for judging the answers to each

14
question. This procedure also helps offset the halo effect in grading. When all of the
answers on one paper are read together, the grader's impression of the paper as a whole
is apt to influence the grades he assigns to the individual answers.
Rule 6: Evaluate answers to essay questions without knowing the identity of the writer.
The best way to prevent our prior knowledge from influencing our judgment is to
evaluate each answer without knowing the identity of the writer. This can be done by
having the students write their names on the back of the paper or by using code
numbers in place of names.
Rule 7: Whenever possible, have two or more persons grade each answer.
The best way to check on the reliability of the scoring of essay answers is to obtain two
or more independent judgments. Although this may not be a feasible practice for
routine classroom testing. it might be done periodically with a fellow teacher (one who
is equally competent in the area). Obtaining two or more independent ratings becomes
especially vital where the results are to be used for important and irreversible
decisions, such as in the selection of students for further training or for special awards.
Some teachers use the cumulative criteria were each student begins with a score of
100. Points are then deducted every time a teacher encounters a mistake or when a
criterion is missed by the student in his essay.
Rule 8: Do not provide optional questions.
It is difficult to construct questions of equal difficulty and so teacher cannot have valid
comparison of students' achievement.
Rule 9: Provide information about the value/weight of the question and how it will be
scored.
Rule 10: Emphasize higher level thinking skills.
Construct a 10 item supply test to assess this competency: Identify the

parts of speech and give 2 examples on each part.

15
Lesson 2
ITEM ANALYSIS:
DIFFICULTY INDEX AND DISCRIMINATION INDEX
The teacher normally prepares a draft of the test. Such a draft is subjected to
item analysis and validation in order to ensure that the final version of the test would
be useful and functional. First, the teacher tries out the draft test to a group of students
of similar characteristics as the intended test takers (try-out phase). From the try-out
group, each item will be analyzed in terms of its ability to discriminate between those
who know and those who do not know and also its level of difficulty (item analysis
phase). The item analysis will provide information that will allow the teacher to decide
whether to revise or replace an item (item revision phase). Then, finally, the final draft
of the test is subjected to validation if the intent is to make use of the test as a standard
test for the particular unit or grading period.
 DIFFICULTY INDEX AND DISCRIMINATION INDEX

There are two important characteristics of an item that will be of interest to the
teacher. These are:
(a) item difficulty and;
(b) discrimination index
We shall learn how to measure these characteristics and apply our knowledge in
making a decision about the item in question. The difficulty of an item or item
difficulty is defined as the number of students who are able to answer the item
correctly divided by the total number of students. Thus:
Item difficulty = number of students with correct answer/ total number of students.
The item difficulty is usually expressed in percentage.

16
Example: What is the item difficulty index of an item if 25 students are unable to
answer it correctly while 75 answered it correctly?
Here, the total number of students is 100, hence the item difficulty index is 75/100
or 75%.
Another example: 25 students answered the item correctly while 75 students did
not. The total number of students is 100 so the difficulty index is 25/100 or 25 which
is 25%.
A high percentage indicates an easy item/question while a low percentage indicates

a difficult item.
One problem with this type of difficulty index is that it may not actually indicate
that the item is difficult (or easy). A student who does not know the subject matter
will naturally be unable to answer the item correctly even if the question is easy.
Difficult items tend to discriminate between those who know and those who do not
know the answer. Conversely, easy items cannot discriminate between these two
groups of students. We are therefore interested in deriving a measure that will tell
us whether an item can discriminate between these two groups of students. Such a
measure is called an index of discrimination.
An easy way to derive such a measure is to measure how difficult an item is with
respect to those in the upper 25% of the class and how difficult it is with respect to
those in the lower 25% of the class. If the upper 25% of the class found the item easy
yet the lower 25% found it difficult, then the item can discriminate properly
between these two groups.
Thus:
Index of discrimination = DU – DL (U - Upper group; L - Lower group)

17
Example: Obtain the index of discrimination of an item if the upper 25% of the class
had a difficulty index of 0.60 (i.e. 60% of the upper 25% got the correct answer)
while the lower 25% of the class had a difficulty index of 0.20. Here. DU = 0.60 while
DL = 0.20, thus index of discrimination = .60 - .20 = .40.
Discrimination index is the difference between the proportion of the top scorers
who got an item correct and the proportion of the lowest scorers who got the item
right. The discrimination index range is between -1 and +1. The closer the
discrimination index is to +1, the more effectively the item can discriminate or
distinguish between the two groups of students. A negative discrimination index
means more from the lower group got the item correctly. The last item is not good
and so must be discarded.
Example:
Item A B C D
1 0 20 40 20 Total
0 5 15 0 Upper 25%
0 5 5 10 Lower 25%
**C is the correct answer
The discrimination index can similarly be computed:

DU = no. of students in upper 25% with correct response/no. of students in the
upper 25%
= 15/20 = .75 or 75%
DL = no. of students in lower 25% with correct response/ no. of students in the
lower 25%
= 5/20 = .25 or 25%
Discrimination Index = DU – DL = .75 - 25 = .50 or 50%.
In the case of the index of difficulty, we have the following rule of thumb:

18
Index Range Interpretation Action

-1.0 - -.50 Can discriminate but item is questionable Discard
-.55 – 0.45 Non – discriminating Revise
0.46 – 1.0 Discriminating item Include
Thus, the item also has a “good discriminating power.”

It is also instructive to note that the distracter A is not an effective distracter since
this was never selected by the students. It is an implausible distracter. Distracters
B and D appear to have good appeal as distracters. They are plausible distracters.
Difficulty Index
Example:
QUESTION A B C D
Number 1 20 2 0 3
Number 2 10 5 9 1
**Colored numbers are the students who got the correct answer
Compute the difficulty of the item by dividing the number of students who choose
the correct answer (20) by the number of total students (25). Using this formula,
the difficulty of Question #1 (referred to as p) is equal to 20/25 or .80. A "rule-of-
thumb" is that if the item difficulty is more than .75, it is an easy item; if the
difficulty is below .25, it is a difficult item.
Solve for the difficulty index of each test item:
Item Number 1 2 3 4 5
No. of Correct Responses 2 10 20 30 15
No. of Students 50 30 30 30 40
Difficulty Index

19
Lesson 3
VALIDATION, VALIDITY AND

RELIABILITY
After performing the item analysis and revising the items Which need revision,
the next step is to validate the instrument. The purpose of validation is to determine
the characteristics of the whole test itself, namely, the validity and reliability of the
test. Validation 15 the process of collecting and analyzing evidence to support the
meaningfulness and usefulness of the test.
 Validity
Validity is the extent to which a test measures what it purports to measure or as
referring to the appropriateness, correctness, meaningfulness and usefulness of the
specific decisions a teacher makes based on the test results. These two definitions of
validity differ in the sense that the first definition refers to the test itself while the
second refers to the decisions made by the teacher based on the test. A test is valid
when it is aligned with the learning outcome.
A teacher who conducts test validation might want to gather different kinds of
evidence.
There are essentially three main types of evidence that may be collected:
 Content-related evidence of validity
 Criterion-related evidence of validity
 Construct-related evidence of validity
Content-related evidence of validity refers to the content and format of the
instrument. How appropriate is the content? How comprehensive? Does it logically get
at the intended variable? How adequately does the sample of items or questions
represent the content to be assessed? Criterion-related evidence of validity refers to
the relationship between scores obtained using the instrument and scores obtained
using one or more other tests (often called criterion). How strong is this relationship?
How well do such scores estimate present or predict future performance of a certain
type?
Construct-related evidence of validity refers to the nature of the psychological
construct or characteristic being measured by the test. How well does a measure of the

20
construct explain differences in the behavior of the individuals or their performance on

a certain task?
The usual procedure for determining content validity may be described as follows: The
teacher writes out the objectives of the test based on the Table of Specifications and
then gives these together with the test to at least two (2) experts along with a
description of the intended test takers. The experts look at the objectives, read over
the items in the test and place a check mark in front of each question that they feel
does not measure one or more objectives. They also place a check mark in front of each
objective not assessed by item in the test. The teacher then rewrites any item checked
and resubmits to the experts and/or writes new items to cover those objectives not
covered by the existing test. This continues until the
experts approve of all items and also until the experts agree that all of the objectives
are sufficiently covered by the test.
In order to obtain evidence of criterion-related validity the teacher usually compares

scores on the test in question with the score on some other independent criterion test
which presumably has already high validity. For example, if a test is designed to
measure mathematics ability of students and it correlates highly with a standardized
mathematics achievement test (external criterion), then we say we have high criterion-
related evidence of validity. In particular, this type of criterion-related validity is called
its concurrent validity. Another type of criterion-related validity is called predictive
validity wherein the test scores in the instrument are correlated with scores on a later
performance (criterion measure) of the students. For example, the mathematics ability
test constructed by the teacher may be correlated with their later performance in a
Division-wide mathematics achievement test.
 Reliability
Reliability refers to the consistency of the scores obtained - how consistent they
are for each individual from one administration of an instrument to another and
from one set of items to another. We already gave the formula for computing
the reliability of a test: for internal consistency; for instance, we could use the
split-half method or the Kuder-Richardson formulae (KR-20 or KR-21)
Reliability and validity are related concepts. If an instrument is unreliable, it
cannot get valid outcomes. As reliability improves, validity may improve (or it
may not). However, if an instrument is shown scientifically to be valid then it is
almost certain that it is also reliable.
.50
Predictive validity compares the question with an outcome assessed at a later
time. An example of predictive validity is a comparison of scores in the National

21
Achievement Test (NAT) with first semester grade point average (GPA) in college.
Do NAT scores predict college performance? Construct validity refers to the
ability of a test to measure what it is supposed to measure.
Formula:
KR-20 is [n/n-1] * [1-(Σp*q)/Var]

where:
 n = sample size for the test,
 Var = variance for the test,
 p = proportion of people passing the item,
 q = proportion of people failing the item.
 Σ = sum up (add up). In other words, multiple Each question’s p by q, and then
add them all up. If you have 10 items, you’ll multiply p*q ten times, then you’ll
add those ten items up to get a total.
Below is the interpretation of each result:
Reliability Interpretation
.90 and above Excellent reliability; at the level of the best standardized
tests
.80-90 Very good for a classroom test
.70-80 Good for a classroom test; in the range of most. There

are probably a few items which could be
improved.
.60-70 Somewhat low. This test needs to be supplemented by

other measures (e.g., more tests) to determine grades.
There are probably some items which could be improved.
.50-60 Suggests need for revision of test, unless it is quite short

(ten or fewer items). The test definitely needs to be
supplemented by other measures (e.g., more tests) for
grading.
.50 or below Questionable reliability. This test should not contribute

heavily to the course grade, and it needs revision.

22
What is the relationship between validity and reliability? Can a test be

reliable and yet not valid?

23
Lesson 4
MEASURES OF CENTRAL TENDENCY
The mean, mode and median are valid measures of cento tendency but under different
conditions, one measure becomes m appropriate than the others. For example, if the scores
are extreme high and extremely low, the median is a better measure of cent tendency since
mean is affected by extremely high and extremely low scores.
The Mean (Arithmetic)

The mean (or average or arithmetic mean) is the most popular and most well-known measure
of central tendency. The mean is equal to the sum of all the values in the data set divided by
the number of values in the data set. For example, 10 students in a Graduate School class got
the following scores in a 100 - item test: 70, 72, 75,77, 78, 80, 84, 87, 90, 92. The mean score
of the group of 10 students is the sum of all their scores divided by 10. The mean, therefore, is
805/10 equals 80.5. 80.5 is the average score of the group. There are 6 scores below the average
score (mean) of the group (70, 72, 75,77,78, and 80) and there are 4 scores above the average
score (mean) of the group (84, 87, 90 and 92).
Median
The median is the middle score for a set of sources arranged from lowest to highest. The mean
is less affected by extremely low and extremely high scores. How do we find the median?
Suppose we have the following data:
65 55 89 56 35 14 56 55 87 45 92
To determine the median, first we have to rearrange the scores into order of magnitude (from
smallest to largest).
14 35 45 55 55 56 56 65 87 89 92
Our median is the score at the middle of the distribution. In this case, 56. It is the middle score.
There are 5 scores before it and 5 scores after it. This works fine when you have an odd number
of scores, but what happens when you have an even number of scores? What if you had 10 scores
like the scores below?
65 55 89 56 35 14 56 55 87 45

24
Arrange that data according to order of magnitude (smallest to largest). Then take the middle
two scores (55 and 56) and compute the average of the two scores. The median is 55.5. This
gives us a more reliable picture of the tendency of the scores. There are indeed scores of 55
and 56 in the score distribution.
Mode
The mode is the most frequent score in our data set. On a histogram or bar chart it
represents the highest bar. If a score is in number of times an option is chosen in a
multiple choice test. Therefore, the mode as being the most popular option. Study the
score distribution given below:
14 35 45 55 55 56 56 65 87 89
Here is a set of scores: 90, 85, 92, 91, 86, 88, 94

What is the median? ____________
What is the mode? _____________
What is the mean? ____________

25

Module Iii Educ 105

Uploaded by

Copyright:

Available Formats

You might also like

Module Iii Educ 105

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module Iii Educ 105

Uploaded by

Copyright:

Available Formats

1

Lesson 1 TYPES OF PAPER AND

Lesson 2 ITEM ANALYSIS:

Lesson 3 VALIDATION, VALIDITY

Lesson 4 MEASURES OF CENTRAL

EDUC 105 ASSESSMENT OF LEARNING

Now that you are acquainted with knowledge about evaluation,

After studying the module, you should be able to:

God bless and happy reading!!!

EDUC 105 ASSESSMENT OF LEARNING

TYPES OF PAPER AND PENCIL

Development of paper-and-pencil tests requires careful planning and expertise

CONSTRUCTING SELECTED-RESPONSE TYPE

EDUC 105 ASSESSMENT OF LEARNING

Guidelines for Constructing Alternate-Response Test

EDUC 105 ASSESSMENT OF LEARNING

For ease of correction, teachers sometime create a pattern of True or

 MULTIPLE CHOICE TEST

EDUC 105 ASSESSMENT OF LEARNING

3) Avoid complex or awkward word arrangements. Also, avoid use of negatives in

4) Do not use negatives or double negatives as such statements tend to be

EDUC 105 ASSESSMENT OF LEARNING

Poor: What does the statement "Development patterns

Better: What does the statement "Development patterns

EDUC 105 ASSESSMENT OF LEARNING

8) The length, explicitness, or degree of technicality of alternatives should not be

9) Avoid stems that reveal the answer to another item.

EDUC 105 ASSESSMENT OF LEARNING

 MATCHING TYPE TEST

Guidelines for Constructing Matching Type Test

Perfect Matching Type

EDUC 105 ASSESSMENT OF LEARNING

 CONSTRUCTING SUPPLY TYPE OR CONSTRUCTED RESPONSE TYPE

Example: The study of life and living organisms is called ___________.

EDUC 105 ASSESSMENT OF LEARNING

5. 1 kilogram ______________ e. 1 pound

Guidelines for the Formulation of a Completion Type of Test

EDUC 105 ASSESSMENT OF LEARNING

- Briefly summarize the contents of ...

EDUC 105 ASSESSMENT OF LEARNING

Non-restricted / Extended Essay

Guidelines for the Formulation and Scoring of Essay Tests

EDUC 105 ASSESSMENT OF LEARNING

Construct a 10 item supply test to assess this competency: Identify the

EDUC 105 ASSESSMENT OF LEARNING

 DIFFICULTY INDEX AND DISCRIMINATION INDEX

EDUC 105 ASSESSMENT OF LEARNING

A high percentage indicates an easy item/question while a low percentage indicates

EDUC 105 ASSESSMENT OF LEARNING

**C is the correct answer

The discrimination index can similarly be computed:

EDUC 105 ASSESSMENT OF LEARNING

Index Range Interpretation Action

Thus, the item also has a “good discriminating power.”

Solve for the difficulty index of each test item:

EDUC 105 ASSESSMENT OF LEARNING

VALIDATION, VALIDITY AND