Traditional Assessments

Traditional assessments
refer to conventional methods of testing, usually

standardized and use pen and paper with multiple-choice, true or false or matching
type test items.
Authentic assessments refer to assessments wherein students are asked to perform
real-world tasks that demonstrate meaningful application of what they have learned.
To better compare traditional vs. alternative assessments, here’s a table I prepared:
Traditional Assessment Authentic Assessment
Purpose: to evaluate if the students have Purpose: to measure students’ proficiency by

learned the content; to determine whether or asking them to perform real life-tasks; to
not the students are successful in acquiring provide students many avenues to learn and
knowledge; to ascribe a grade for them; to demonstrate best what they have learned; to
rank and compare them against standards or guide instruction; to provide feedback and
other learners help students manage their own learning; to
also evaluate students’ competency
Provides teachers a snapshot of what the Provides teachers a more complete picture of
students know what the students know and what they can do
with what they know
Measures students’ knowledge of the content Measures students’ ability to apply knowledge
of the content in real life situations; ability to
use/apply what they have learned in
meaningful ways
Requires students to demonstrate knowledge Requires students to demonstrate proficiency

by selecting a response/giving correct by performing relevant tasks showing
answers; usually tests students’ proficiency application of what has been learned
through paper and pencil tests
Students are asked to choose an answer from a
set of questions (True or False; multiple
choice) to test knowledge of what has been
taught.
Provides indirect evidence of learning Provides direct evidence of

learning/competency; direct demonstration of
knowledge and skills by performing relevant
tasks
Requires students to practice cognitive ability Provides opportunities for students to

to recall/recognize/reconstruct body of construct meaning/new knowledge out of
knowledge that has been taught what has been taught
Tests and strengthens the students’ ability to Tests and strengthens the students’ ability to
recall/recognize and comprehend content, but reason and analyze, synthesize, and apply
does not reveal the students’ true progress of knowledge acquired; Students’ higher level of
what they can do with the knowledge they cognitive skills (from knowledge and
acquired. Only the students’ lower level of comprehension to analysis, synthesis,
thinking skills, (knowledge and application, and evaluation) are tapped in
comprehension), are tapped. multiple ways.

Hides the test Teaches the test
Teachers serve as evaluators and students as Involves and engages the students in the
the evaluatees: teacher-structured teaching, learning and assessment process:
student structured
Assessment is separated from teaching and Assessment is integrated with instruction.

learning. Test usually comes after instruction Assessment activities happen all throughout
to evaluate if the students have successfully instruction to help students improve their
learned the content. learning and help teachers improve their
teaching.
Provides limited ways for students to Provides multiple avenues for students to
demonstrate what they have learned demonstrate best what they have learned
Rigid and fixed Flexible and provides multiple acceptable

ways of constructing products or performance
as evidence of learning
Standardized; valid and reliable Needs well defined criteria/rubrics and

standards to achieve reliability and validity
Curriculum drives assessment. Assessment drives curriculum and instruction.
Examples: Examples:
True or False; multiple choice tests demonstrations
standardized tests hands-on experiments
achievement tests computer simulations
intelligence tests portfolios
aptitude tests projects
multi-media presentations
role plays
recitals
stage plays
exhibits
http://www.google.com.ph/url?
sa=i&source=imgres&cd=&cad=rja&uact=8&ved=0CAgQjRwwAGoVChMI7K7d3v_uxgIVQyym
Ch0ARA1K&url=https%3A%2F%2Feng701sfsu.wordpress.com%2F&ei=P6-
vVayTMcPYmAWAiLXQBA&psig=AFQjCNHslcUS5Qn6kq__3PjONYd9_6ZZvg&ust=14376634
23985155
Advantages of Traditional Assessment Over Authentic Assessment:
Traditional assessments do have advantages over authentic assessments:
Advantages: Disadvantages:
Easy to score; Teachers can evaluate students Harder to evaluate

more quickly and easily.
Less time and easier to prepare; easy to Time consuming; labor intensive
administer Sometimes, time and effort spent exceed the
benefits.
Objective, reliable and valid Susceptible to unfairness, subjectivity, lacking

objectivity, reliability, and validity if not
properly guided by well-defined/clear criteria
or rubrics/standards
Economical Less economical

Advantages of Authentic Assessment Over Traditional Assessment
On the other hand, here are the advantages of authentic assessment over the
traditional assessment:
Disadvantages: Advantages:
Provides teachers with just a snapshot of what Provides teachers with the true picture of how
the students have truly learned and where their students are in their learning;
gives more information about their students’
strengths, weaknesses, needs and preferences
that aid them in adjusting instruction towards
enhanced teaching and learning
Provides students limited options to Provides students many alternatives/ways to

demonstrate what they have learned, usually demonstrate best what they have learned;
limited to pencil and paper tests offers a wide array of interesting and
challenging assessment activities
Assessment is separate from instruction. Assessment is integrated with instruction.
Reveals and strengthens only the students’ Reveals and enriches the students’ high level
low level cognitive skills: knowledge and cognitive skills: from knowledge and
comprehension comprehension to analysis, synthesis,
application and evaluation
Assesses only the lower level Enhances students’ ability to apply skills and
thinking/cognitive skills: focuses only on the knowledge to real lie situations; taps high
students’ ability to memorize and recall order cognitive and problem solving skills
information
Hides the test Teaches the test
Teacher-structured: teachers direct and act as Student-structured: students are more engaged
evaluators; students merely answer the in their learning; assessment results guide
assessment tool. instruction
Involves students working alone; promotes Oftentimes involves students working in

competitiveness groups hence promotes team work,
collaborative and interpersonal skills
Invokes feelings of anxiety detrimental to Reduces anxiety and creates a more relaxed
learning happy atmosphere that boosts learning
Time is fixed and limited; students are time- Time is flexible.

pressured to finish the test.
Focuses on one form of intelligence Focuses on the growth of the learner;

Learners express their understanding of the
learning content using their preferred multiple
forms of intelligences.
Provides parents and community with more
observable products, proofs of the students’
learning which motivate them to support their
kids’ learning more
Module 2 – Qualities of a good Measuring Instrument
Leave a reply
Whether a test is standardized or teacher-made, it should apply the qualities of a good measuring
instrument. This module discusses the qualities of a good test which are: validity, reliability, and
usability.
After reading this module, students are expected to:
1. define and explain the characteristics of a good measuring instruments;
2. identify the types of validity;
3. describe what conditions can affect the validity of test items;
4. discuss the factors that affect the reliability of test;
5. estimate test reliability using different methods;
6. enumerate and discuss the factors that determine the usability of test; and
7. point out which is the most important characteristics of a good test.
Validity
Validity – is the most important characteristics of a good test. Validity – refers to the extent to which the
test serves its purpose or the efficiency with which it measures what it intends to measure.
The validity of test concerns what the test measures and how well it does for. For example, in order to
judge the validity of a test, it is necessary to consider what behavior the test is supposed to measure.
A test may reveal consistent scores but if it is not useful for the purpose, then it is not valid. For
example, a test for grade V students given to grade IV is not valid.
Validity is classified into four types: content validity, concurrent validity, predictive validity, and
construct validity.
Content validity – means that extent to which the content of the test is truly a representative of the
content of the course. A well constructed achievement test should cover the objectives of instruction,
not just its subject matter. Three domains of behavior are included: cognitive, affective and
psychomotor.
Concurrent validity – is the degree to which the test agrees with or correlates with a criterion which is
set up an acceptable measure. The criterion is always available at the time of testing.
Concurrent validity or criterion-related validity- establishes statistical tool to interpret and correlate test
results.
For example, a teacher wants to validate an achievement test in Science (X) he constructed. He
administers this test to his students. The result of this test can be compared to another Science students
(Y), which has been proven valid. If the relationship between X and Y is high, this means that the
achievement test is Science is valid. According to Garrett, a highly reliable test is always valid measure of
some functions.
Predictive validity – is evaluated by relating the test to some actual achievement of the students of
which the test is supposed to predict his success. The criterion measure against this type is important
because the future outcome of the testee is predicted. The criterion measure against which the test
scores are validated and obtained are available after a long period.
Construct validity – is the extent to which the test measures a theoretical trait. Test item must include
factors that make up psychological construct like intelligence, critical thinking, reading comprehension
or mathematical aptitude.
Factors that influence validity are:
1. Inappropriateness of test items – items that measure knowledge can not measure skill.
2. Direction – unclear direction reduce validity. Direction that do not clearly indicate how the pupils
should answer and record their answers affect validity of test items.
3. Reading vocabulary and sentence structures – too difficult and complicated vocabulary and sentence
structure will not measure what it intend to measure.
4. Level of difficulty of Items – too difficult or too easy test items can not discriminate between bright
and slow pupils will lower its validity.
5. Poorly constructed test item – test items that provide clues and items that are ambiguous confuse the
students and will not reveal a true measure.
6. Length of the test- a test should of sufficient length to measure what it is supposed to measure. A test
that is too short can not adequately sample the performance we want to measure.
7. Arrangement of items – test item should be arrange according to difficulty, with the easiest items to
the difficult ones. Difficult items when encountered ahead may cause mental block and may also cause
student to take much time in that number.
8. Patterns of answers – when students can detect the pattern of correct answer, they are liable to guess
and this lowers validity.
Reliability
Reliability means consistency and accuracy. It refers then to the extent to which a test is dependable,
self consistent and stable. In other words, the test agrees with itself. It is concerned with the consistency
of responses from moment to moments even if the person takes the same test twice, the test yields the
same result.
For example, if a student got a score of 90 in an English achievement test this Monday and gets 30 on
the same test given on Friday, then both score can not be relied upon.
Inconsistency of individual scores however may be affected by person’s scoring the test, by limited
samples on certain areas of the subject matter and particularly the examinees himself. If the examinees
mood is unstable this may affect his score.
Factors that affect reliability are:
1. Length of the test. As a general rule, the longer the test, the higher the reliability. A longer test
provides a more adequate sample of the behavior being measured and is less distorted by chance
factors like guessing.
2. Difficulty of the test. When a test is too easy or too difficult, it cannot show the differences among
individuals; thus it is unreliable. Ideally, achievement tests should be constructed such that the average
score is 50 percent correct and the scores range from near zero to near perfect.
3. Objectivity. Objectivity eliminates the bias, opinions or judgments of the person who checks the test.
Reliability is greater when test can be scored objectively.
4. Heterogeneity of the student group. Reliability is higher when test scores are spread over a range of
abilities. Measurement errors are smaller than that of a group that is more heterogeneous.
5. Limited time. a test in which speed is a factor is more reliable than a test that is conducted at a longer
time.
A reliable test however, is not always valid.
Methods of Estimating Reliability of Test:

1. Test-retest method. The same instrument is administered twice to the same group of subjects. The
scores of the first and second administrations of the test are determined by Spearman rank correlation
coefficient or Spearman rho and Pearson Product-Moment Correlation Coefficient.
The formula using Spearman rho is:
rs  1 – 6D2 Where ; D2 = sum of squared difference
N3 – N between ranks
N = total number of cases
For example, 10 students where used as samples to test the reliability of the achievement test in
Biology. After two administration of test the data and computation of Spearman rho is presented in the
table below:
Differences squared
Scores Ranks between ranks difference
Students S1 S2 R1 R2 D D2
1 89 90 2 1.5 0.5 0.25
2 85 85 4.5 4 0.5 0.25
3 77 76 9 9 0 0
4 80 81 7.5 8 0.5 0.25
5 83 83 6 6.5 0.5 0.25
6 87 85 3 4 1.0 1.0
7 90 90 1 1.5 0.5 0.25
8 73 72 10 10 0 0
9 85 85 4.5 4 0.5 0.25
10 80 83 7.5 6.5 1.0 1.0
Total D = 3.5
rs = 1 – 6D2
N3 – N
=1–
=1–
= 1 – 0.0212
= 0.98 (very high relationship
The rs value obtained is 0.98 which means very high relationship; hence achievement test in Biology is
reliable.
Pearson Product-Moment Correlation Coefficient can also be used for test-retest method of estimating
the reliability of test. The formula is:
Using the same data for Spearman rho, the scores for 1st and 2nd administration may be presented in
this way:
X (S1) Y(S2) X2 Y2 XY
89 90 7921 8100 8010
85 85 7225 7225 7225
77 76 5929 5776 5852
80 81 6400 6561 6480
83 83 6869 6869 6869
87 85 7569 7225 7395
90 90 8100 8100 8100

73 72 5329 5184 5256
85 85 7225 7225 7225
80 83 6400 6889 6640
X = 829  = 830 X2 = 68967 2 =69154 X = 69052
Could you now compute by using the formula above? Illustrate below:
Alternate-forms method. The second method of establishing the reliability of test results. In this
method, we give two forms of a test similar in content, type of items, difficulty, and others in close
succession to the same group of students. To test the reliability the correlation technique is used (refer
to the formula used in Pearson Product-Moment Correlation Coefficient).
2. Split-half method. The test may be administered once, but the test items are divided into two halves.
The most common procedure is to divide a test into odd or even items. The results are correlated and
the r obtained is the reliability coefficient for a half test. The Spearman-Brown formula is used which is:
where; r = reliability of whole test
rht = reliability of half of the test
For example, rht is 0.69. what is r?
rt = 2 rht
1 + rht
= 2(0.69)
1+ 0.69
= 0.82 very high relationship, so the test is reliable.
Split-half method is applicable for not highly speeded measuring instrument. If the measuring
instrument includes easy items and the subject is able to answer correctly all or nearly all items within
the time limit of the test, the scores on the two halves would be about similar and the correlation would
be closed to +1.00.
3. Kuder-Richardson Formula 21 is the last method of establishing the reliability of a test. Like the split
half method, a test is conducted only once. This method assumes that all items are of equal difficulty.
The formula is:
Where:
X = the mean of the obtained scores
S = the standard deviation
k = the total number of items
Example: Mr. Marvin administered a 50-item test to 10 of his grade 5 pupils. The scores of his pupils are
presented in the table below:
Pupils
Score (X)
X-X
(X-X)
A 32 3.2 10.24
B 36 7.2 51.84
C 36 7.2 51.84
D 22 -6.8 46.24
E 38 9.2 84.64
F 15 -13.8 190.44
G 43 14.2 201.64
H 25 -3.8 14.44
I 18 -10.8 116.64
J 23 -5.8 33.64
288 801.60
X = 28.8 S = 89.07 k = 50
Show how mean and standard deviation was obtained in the box below:
Could you now compute the reliability of the test applying the formula on Kuder Richardson formula 21?
Please try!
Usability
Usability means the degree to which the tests are used without much expenditure of time, money and
effort. It also means practicability. Factors that determine usability are: administrability, scorability,
interpretability, economy and proper mechanical makeup of the test.
Administrability means that the test can be administered with ease, clarity and uniformity. Directions
must be made simple, clear and concise. Time limits, oral instructions and sample questions are
specified. Provisions for preparation, distribution, and collection of test materials must be definite.
Scorability is concerned on scoring of test. A good test is easy to score thus: scoring direction is clear,
scoring key is simple, answer is available, and machine scoring as much as possible be made possible.
Test results can be useful if after evaluation it is interpreted. Correct interpretation and application of
test results is very useful for sound educational decisions.
An economical test is of low cost. One way to economize cost is to use answer sheet and reusable test.
However, test validity and reliability should not be sacrificed for economy.
Proper mechanical make-up of the test concerns on how tests are printed, what font size are used, and
are illustrations fit the level of pupils/students.
Summary
A good measuring instruments posses’ three qualities, which include: validity, reliability and usability.
Validity – the extent to measure what it intends to measure. It has four types; the content, construct,
concurrent and predictive. Test validity can be affected by: inappropriateness of the test, direction,
vocabulary and construction of test, level of difficulty, constructions, length of test, arrangement of
items and patterns of answers.
Reliability is the consistency of scores obtained by an individual given the same test at different times.
This can be estimated using test-retest method, alternate forms, split-half and kuder-Richardson
Formula 21. The reliability of test may however be affected by the length of test, the difficulty of test
item, the objectivity of scoring heterogeneity of the student group, and limited time.
Usability of test means its practicability. This quality is determined by: ease in administration
(administrability), ease in scoring (scorability), ease in interpretation and application (interpretability),
economy of materials and the proper mechanical make-up of the test.
A test to be effective must be valid. For a valid test is always valid, but not all reliable test is valid.
Learning Exercises
I. Multiple Choice: Encircle the correct answer.
1. Which statement concerning validity and reliability is most accurate?
a. A test can not be reliable unless it is valid.
b. A test can not be valid unless it is reliable.
c. A test can not be valid and reliable unless it is objective.
d. A test can not be valid and reliable unless it is standardized.
2. Which type of validity is appropriate for criterion-reference measure?
a. content validity c. construct validity

b. concurrent validity d. predictive validity
3. Which is directly affected by objectivity in scoring?
a. The validity of test c. The reliability of test
b. The usability of test d. The administrability of test
2. A teacher-made test constructed had overemphasized facts and underemphasized other objectives of
the course for which it is designed, what can be said about the test?
a. It lacks content validity.
b. It lacks construct validity.
c. It lacks predictive validity.
d. It lacks criterion-related validity.
3. When an achievement test for grade V pupils was administered to grade VI, what is most affected?
a. reliability of the test c. usability of the test
b. validity of the test D. reliability and validity of the test
4. Which factor of usability is described by the wise use of testing materials?
a. scorability c. economy
b. adminstrability d. proper mechanical make-up
5. Clarity and uniformity in giving directions affects:
a. scorability of the test c. interpretability of the test
b. administrability of the test d. proper mechanical make-up
6. Which best describe validity?
a. consistency in test result

b. practicability of the test
c. homogeneity in the content of the test
d. objectivity in administration and scoring of test
II. Setting and Option Multiple choice: Table 1 presents the scores of 10 students who were tested twice
(test-retest) to test the reliability of such test. Complete the table and answer the question below.
Choose the correct answer and show the computation in the question where it is needed.
Score Rank Differences Squared
Students from ranks difference
S1 S2 R1 R2 D D2
1 68 71
2 65 65
3 70 69
4 65 68
5 70 72
6 65 63
7 62 62
8 64 66
9 58 60
10 60 60
Total =
1. What is the squared difference between ranks of students?
a. 13 b. 14 c. 15 d. 16
2. Who got the highest score in the second administration of test?

a. student 1 c. student 5
b. student 3 d. student 7
3. What is the calculated rs?
a. 0.86 b. 0.88 c. 0.90 d. 0.92
4. What is the Garrets’ interpretation of the obtained rs (refer to question no.3)?
a. negligible correlation c. marked relationship
b. low correlation d. high or very high relationship
5. Based on Garrets’ interpretation of the calculated rs, what can you say about the test constructed?
III. Simple Recall:
1. Mr. Gwen conducted a 40-item Mathematics test to his 10 udents. Their scores in the first half and in
the second half are shown below. Find the reliability of the whole test using split-half method. Is the test
reliable? Justify.
1st half – 17 18 20 11 10 13 20 19 19 15
2nd half – 15 13 18 10 8 10 18 16 17 14
2. Ms. Pearl administered a 30 item science test to her Grade 6 pupils. the scores are shown below.
What is the reliability of the whole test using Kuder-Richardson Formula 21? Is the test reliable? Justify.
Pupils – A B C D E F G H I J K L
Scores – 25 22 30 22 17 15 18 24 27 18 23 26
IV. Essay:
1. In your own opinion, which is better a valid test or reliable test? Why?
2. Why do you think students’ score in a particular test sometimes vary?
3. Discuss what makes test items/test results invalid?
References
Asaad, Abubakar S. and Wilham M. Hailaya. Measurement and Evaluation,
Concepts and Principles, 1st Ed. Manila, Philippines: Rex Book Store,
Inc. 2004.
Calmorin, Laurentina. Educational Research, Measurement and Evaluation, 2nd Ed. Metro Manila,
Philippines: National Book Store, Inc. 1994.
Oriondo, Leonora L. and Eleonor M. Antonio. Evaluating Educational Outcomes. Manila, Philippines: Rex
Book store, 1984.
Advertisements
REPORT THIS AD
Share this:

Traditional Assessments

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Traditional Assessments

Uploaded by

Copyright:

Available Formats

Traditional assessments

refer to conventional methods of testing, usually

Traditional Assessment Authentic Assessment

Purpose: to evaluate if the students have Purpose: to measure students’ proficiency by

Requires students to demonstrate knowledge Requires students to demonstrate proficiency

Provides indirect evidence of learning Provides direct evidence of

Requires students to practice cognitive ability Provides opportunities for students to

Hides the test Teaches the test

Assessment is separated from teaching and Assessment is integrated with instruction.

Rigid and fixed Flexible and provides multiple acceptable

Standardized; valid and reliable Needs well defined criteria/rubrics and

Curriculum drives assessment. Assessment drives curriculum and instruction.

Traditional Assessment Authentic Assessment

Easy to score; Teachers can evaluate students Harder to evaluate

Objective, reliable and valid Susceptible to unfairness, subjectivity, lacking

Economical Less economical

Provides students limited options to Provides students many alternatives/ways to

Assessment is separate from instruction. Assessment is integrated with instruction.

Hides the test Teaches the test

Involves students working alone; promotes Oftentimes involves students working in

Time is fixed and limited; students are time- Time is flexible.

Focuses on one form of intelligence Focuses on the growth of the learner;

After reading this module, students are expected to:

1. define and explain the characteristics of a good measuring instruments;

2. identify the types of validity;

3. describe what conditions can affect the validity of test items;

4. discuss the factors that affect the reliability of test;

5. estimate test reliability using different methods;

7. point out which is the most important characteristics of a good test.

Factors that influence validity are:

Factors that affect reliability are:

A reliable test however, is not always valid.

Methods of Estimating Reliability of Test:

The formula using Spearman rho is:

rs  1 – 6D2 Where ; D2 = sum of squared difference

N = total number of cases

Scores Ranks between ranks difference

1 89 90 2 1.5 0.5 0.25

2 85 85 4.5 4 0.5 0.25

4 80 81 7.5 8 0.5 0.25

5 83 83 6 6.5 0.5 0.25

7 90 90 1 1.5 0.5 0.25

9 85 85 4.5 4 0.5 0.25

10 80 83 7.5 6.5 1.0 1.0

= 0.98 (very high relationship

89 90 7921 8100 8010

85 85 7225 7225 7225

77 76 5929 5776 5852

80 81 6400 6561 6480

83 83 6869 6869 6869

87 85 7569 7225 7395

90 90 8100 8100 8100

85 85 7225 7225 7225

80 83 6400 6889 6640

X = 829  = 830 X2 = 68967 2 =69154 X = 69052

where; r = reliability of whole test

rht = reliability of half of the test

For example, rht is 0.69. what is r?

= 0.82 very high relationship, so the test is reliable.

X = the mean of the obtained scores

S = the standard deviation

k = the total number of items