Professional Documents
Culture Documents
Traditional Assessments
Traditional Assessments
Provides teachers a snapshot of what the Provides teachers a more complete picture of
students know what the students know and what they can do
with what they know
Measures students’ knowledge of the content Measures students’ ability to apply knowledge
of the content in real life situations; ability to
use/apply what they have learned in
meaningful ways
Tests and strengthens the students’ ability to Tests and strengthens the students’ ability to
recall/recognize and comprehend content, but reason and analyze, synthesize, and apply
does not reveal the students’ true progress of knowledge acquired; Students’ higher level of
what they can do with the knowledge they cognitive skills (from knowledge and
acquired. Only the students’ lower level of comprehension to analysis, synthesis,
thinking skills, (knowledge and application, and evaluation) are tapped in
comprehension), are tapped. multiple ways.
Teachers serve as evaluators and students as Involves and engages the students in the
the evaluatees: teacher-structured teaching, learning and assessment process:
student structured
Provides limited ways for students to Provides multiple avenues for students to
demonstrate what they have learned demonstrate best what they have learned
Examples: Examples:
True or False; multiple choice tests demonstrations
standardized tests hands-on experiments
achievement tests computer simulations
intelligence tests portfolios
aptitude tests projects
multi-media presentations
role plays
recitals
stage plays
exhibits
http://www.google.com.ph/url?
sa=i&source=imgres&cd=&cad=rja&uact=8&ved=0CAgQjRwwAGoVChMI7K7d3v_uxgIVQyym
Ch0ARA1K&url=https%3A%2F%2Feng701sfsu.wordpress.com%2F&ei=P6-
vVayTMcPYmAWAiLXQBA&psig=AFQjCNHslcUS5Qn6kq__3PjONYd9_6ZZvg&ust=14376634
23985155
Advantages of Traditional Assessment Over Authentic Assessment:
Traditional assessments do have advantages over authentic assessments:
Advantages: Disadvantages:
Less time and easier to prepare; easy to Time consuming; labor intensive
administer Sometimes, time and effort spent exceed the
benefits.
Disadvantages: Advantages:
Provides teachers with just a snapshot of what Provides teachers with the true picture of how
the students have truly learned and where their students are in their learning;
gives more information about their students’
strengths, weaknesses, needs and preferences
that aid them in adjusting instruction towards
enhanced teaching and learning
Reveals and strengthens only the students’ Reveals and enriches the students’ high level
low level cognitive skills: knowledge and cognitive skills: from knowledge and
comprehension comprehension to analysis, synthesis,
application and evaluation
Assesses only the lower level Enhances students’ ability to apply skills and
thinking/cognitive skills: focuses only on the knowledge to real lie situations; taps high
students’ ability to memorize and recall order cognitive and problem solving skills
information
Teacher-structured: teachers direct and act as Student-structured: students are more engaged
evaluators; students merely answer the in their learning; assessment results guide
assessment tool. instruction
Invokes feelings of anxiety detrimental to Reduces anxiety and creates a more relaxed
learning happy atmosphere that boosts learning
Leave a reply
Whether a test is standardized or teacher-made, it should apply the qualities of a good measuring
instrument. This module discusses the qualities of a good test which are: validity, reliability, and
usability.
6. enumerate and discuss the factors that determine the usability of test; and
Validity
Validity – is the most important characteristics of a good test. Validity – refers to the extent to which the
test serves its purpose or the efficiency with which it measures what it intends to measure.
The validity of test concerns what the test measures and how well it does for. For example, in order to
judge the validity of a test, it is necessary to consider what behavior the test is supposed to measure.
A test may reveal consistent scores but if it is not useful for the purpose, then it is not valid. For
example, a test for grade V students given to grade IV is not valid.
Validity is classified into four types: content validity, concurrent validity, predictive validity, and
construct validity.
Content validity – means that extent to which the content of the test is truly a representative of the
content of the course. A well constructed achievement test should cover the objectives of instruction,
not just its subject matter. Three domains of behavior are included: cognitive, affective and
psychomotor.
Concurrent validity – is the degree to which the test agrees with or correlates with a criterion which is
set up an acceptable measure. The criterion is always available at the time of testing.
Concurrent validity or criterion-related validity- establishes statistical tool to interpret and correlate test
results.
For example, a teacher wants to validate an achievement test in Science (X) he constructed. He
administers this test to his students. The result of this test can be compared to another Science students
(Y), which has been proven valid. If the relationship between X and Y is high, this means that the
achievement test is Science is valid. According to Garrett, a highly reliable test is always valid measure of
some functions.
Predictive validity – is evaluated by relating the test to some actual achievement of the students of
which the test is supposed to predict his success. The criterion measure against this type is important
because the future outcome of the testee is predicted. The criterion measure against which the test
scores are validated and obtained are available after a long period.
Construct validity – is the extent to which the test measures a theoretical trait. Test item must include
factors that make up psychological construct like intelligence, critical thinking, reading comprehension
or mathematical aptitude.
1. Inappropriateness of test items – items that measure knowledge can not measure skill.
2. Direction – unclear direction reduce validity. Direction that do not clearly indicate how the pupils
should answer and record their answers affect validity of test items.
3. Reading vocabulary and sentence structures – too difficult and complicated vocabulary and sentence
structure will not measure what it intend to measure.
4. Level of difficulty of Items – too difficult or too easy test items can not discriminate between bright
and slow pupils will lower its validity.
5. Poorly constructed test item – test items that provide clues and items that are ambiguous confuse the
students and will not reveal a true measure.
6. Length of the test- a test should of sufficient length to measure what it is supposed to measure. A test
that is too short can not adequately sample the performance we want to measure.
7. Arrangement of items – test item should be arrange according to difficulty, with the easiest items to
the difficult ones. Difficult items when encountered ahead may cause mental block and may also cause
student to take much time in that number.
8. Patterns of answers – when students can detect the pattern of correct answer, they are liable to guess
and this lowers validity.
Reliability
Reliability means consistency and accuracy. It refers then to the extent to which a test is dependable,
self consistent and stable. In other words, the test agrees with itself. It is concerned with the consistency
of responses from moment to moments even if the person takes the same test twice, the test yields the
same result.
For example, if a student got a score of 90 in an English achievement test this Monday and gets 30 on
the same test given on Friday, then both score can not be relied upon.
Inconsistency of individual scores however may be affected by person’s scoring the test, by limited
samples on certain areas of the subject matter and particularly the examinees himself. If the examinees
mood is unstable this may affect his score.
1. Length of the test. As a general rule, the longer the test, the higher the reliability. A longer test
provides a more adequate sample of the behavior being measured and is less distorted by chance
factors like guessing.
2. Difficulty of the test. When a test is too easy or too difficult, it cannot show the differences among
individuals; thus it is unreliable. Ideally, achievement tests should be constructed such that the average
score is 50 percent correct and the scores range from near zero to near perfect.
3. Objectivity. Objectivity eliminates the bias, opinions or judgments of the person who checks the test.
Reliability is greater when test can be scored objectively.
4. Heterogeneity of the student group. Reliability is higher when test scores are spread over a range of
abilities. Measurement errors are smaller than that of a group that is more heterogeneous.
5. Limited time. a test in which speed is a factor is more reliable than a test that is conducted at a longer
time.
N3 – N between ranks
For example, 10 students where used as samples to test the reliability of the achievement test in
Biology. After two administration of test the data and computation of Spearman rho is presented in the
table below:
Differences squared
Students S1 S2 R1 R2 D D2
3 77 76 9 9 0 0
6 87 85 3 4 1.0 1.0
8 73 72 10 10 0 0
Total D = 3.5
rs = 1 – 6D2
N3 – N
=1–
=1–
= 1 – 0.0212
The rs value obtained is 0.98 which means very high relationship; hence achievement test in Biology is
reliable.
Pearson Product-Moment Correlation Coefficient can also be used for test-retest method of estimating
the reliability of test. The formula is:
Using the same data for Spearman rho, the scores for 1st and 2nd administration may be presented in
this way:
X (S1) Y(S2) X2 Y2 XY
Could you now compute by using the formula above? Illustrate below:
Alternate-forms method. The second method of establishing the reliability of test results. In this
method, we give two forms of a test similar in content, type of items, difficulty, and others in close
succession to the same group of students. To test the reliability the correlation technique is used (refer
to the formula used in Pearson Product-Moment Correlation Coefficient).
2. Split-half method. The test may be administered once, but the test items are divided into two halves.
The most common procedure is to divide a test into odd or even items. The results are correlated and
the r obtained is the reliability coefficient for a half test. The Spearman-Brown formula is used which is:
rt = 2 rht
1 + rht
= 2(0.69)
1+ 0.69
Split-half method is applicable for not highly speeded measuring instrument. If the measuring
instrument includes easy items and the subject is able to answer correctly all or nearly all items within
the time limit of the test, the scores on the two halves would be about similar and the correlation would
be closed to +1.00.
3. Kuder-Richardson Formula 21 is the last method of establishing the reliability of a test. Like the split
half method, a test is conducted only once. This method assumes that all items are of equal difficulty.
The formula is:
Where:
Example: Mr. Marvin administered a 50-item test to 10 of his grade 5 pupils. The scores of his pupils are
presented in the table below:
Pupils
Score (X)
X-X
(X-X)
A 32 3.2 10.24
B 36 7.2 51.84
C 36 7.2 51.84
D 22 -6.8 46.24
E 38 9.2 84.64
F 15 -13.8 190.44
G 43 14.2 201.64
H 25 -3.8 14.44
I 18 -10.8 116.64
J 23 -5.8 33.64
288 801.60
X = 28.8 S = 89.07 k = 50
Show how mean and standard deviation was obtained in the box below:
Could you now compute the reliability of the test applying the formula on Kuder Richardson formula 21?
Please try!
Usability
Usability means the degree to which the tests are used without much expenditure of time, money and
effort. It also means practicability. Factors that determine usability are: administrability, scorability,
interpretability, economy and proper mechanical makeup of the test.
Administrability means that the test can be administered with ease, clarity and uniformity. Directions
must be made simple, clear and concise. Time limits, oral instructions and sample questions are
specified. Provisions for preparation, distribution, and collection of test materials must be definite.
Scorability is concerned on scoring of test. A good test is easy to score thus: scoring direction is clear,
scoring key is simple, answer is available, and machine scoring as much as possible be made possible.
Test results can be useful if after evaluation it is interpreted. Correct interpretation and application of
test results is very useful for sound educational decisions.
An economical test is of low cost. One way to economize cost is to use answer sheet and reusable test.
However, test validity and reliability should not be sacrificed for economy.
Proper mechanical make-up of the test concerns on how tests are printed, what font size are used, and
are illustrations fit the level of pupils/students.
Summary
A good measuring instruments posses’ three qualities, which include: validity, reliability and usability.
Validity – the extent to measure what it intends to measure. It has four types; the content, construct,
concurrent and predictive. Test validity can be affected by: inappropriateness of the test, direction,
vocabulary and construction of test, level of difficulty, constructions, length of test, arrangement of
items and patterns of answers.
Reliability is the consistency of scores obtained by an individual given the same test at different times.
This can be estimated using test-retest method, alternate forms, split-half and kuder-Richardson
Formula 21. The reliability of test may however be affected by the length of test, the difficulty of test
item, the objectivity of scoring heterogeneity of the student group, and limited time.
Usability of test means its practicability. This quality is determined by: ease in administration
(administrability), ease in scoring (scorability), ease in interpretation and application (interpretability),
economy of materials and the proper mechanical make-up of the test.
A test to be effective must be valid. For a valid test is always valid, but not all reliable test is valid.
Learning Exercises
2. A teacher-made test constructed had overemphasized facts and underemphasized other objectives of
the course for which it is designed, what can be said about the test?
3. When an achievement test for grade V pupils was administered to grade VI, what is most affected?
a. scorability c. economy
II. Setting and Option Multiple choice: Table 1 presents the scores of 10 students who were tested twice
(test-retest) to test the reliability of such test. Complete the table and answer the question below.
Choose the correct answer and show the computation in the question where it is needed.
S1 S2 R1 R2 D D2
1 68 71
2 65 65
3 70 69
4 65 68
5 70 72
6 65 63
7 62 62
8 64 66
9 58 60
10 60 60
Total =
a. 13 b. 14 c. 15 d. 16
b. student 3 d. student 7
5. Based on Garrets’ interpretation of the calculated rs, what can you say about the test constructed?
1. Mr. Gwen conducted a 40-item Mathematics test to his 10 udents. Their scores in the first half and in
the second half are shown below. Find the reliability of the whole test using split-half method. Is the test
reliable? Justify.
1st half – 17 18 20 11 10 13 20 19 19 15
2nd half – 15 13 18 10 8 10 18 16 17 14
2. Ms. Pearl administered a 30 item science test to her Grade 6 pupils. the scores are shown below.
What is the reliability of the whole test using Kuder-Richardson Formula 21? Is the test reliable? Justify.
Pupils – A B C D E F G H I J K L
Scores – 25 22 30 22 17 15 18 24 27 18 23 26
IV. Essay:
1. In your own opinion, which is better a valid test or reliable test? Why?
2. Why do you think students’ score in a particular test sometimes vary?
References
Concepts and Principles, 1st Ed. Manila, Philippines: Rex Book Store,
Inc. 2004.
Calmorin, Laurentina. Educational Research, Measurement and Evaluation, 2nd Ed. Metro Manila,
Philippines: National Book Store, Inc. 1994.
Oriondo, Leonora L. and Eleonor M. Antonio. Evaluating Educational Outcomes. Manila, Philippines: Rex
Book store, 1984.
Advertisements
REPORT THIS AD
Share this: