Professional Documents
Culture Documents
Evaluation, Measurement and Assessment Cluster 14
Evaluation, Measurement and Assessment Cluster 14
Evaluation, Measurement and Assessment Cluster 14
Cluster 14
Basic Terminology
• Evaluation: a judgment-decision making about
performance
• Measurement: a number representing an evaluation
• Assessment: procedure to gather information(variety of
them)
• Norm-referenced test: Testing in which scores are
compared with the average performance of others
• Criterion-referenced testing: Testing in which score are
compared to a fixed (set performance standard.) Measure
the mastery of very specific objectives.
• Example: Driver’s License Exam
Norm-Referenced Tests
• Performance of others as basis for interpreting a person’s raw score
(actual number of correct test items)
• Three types: 1) Class 2) School District 3) National
• Score reflects general knowledge vs. mastery of specific skills and
information
• Uses: measuring overall achievement and selection of few top
candidates
• Limitations:
– no indication of prerequisite knowledge for more advanced
material has been mastered
– less appropriate for measuring affective and psychomotor
objectives
– encourages competition and comparison scores
Criterion-Referenced Tests
• Comparison with a fixed standard
• Example: Driver’s License
• Use: Measure mastery of a very specific objective when goal is
to achieve set standard
• Limitations
– absolute standards difficult to set in some areas
– standards tend to be arbitrary
– not appropriate comparison when others are valuable
Comparing Norm- and
Criterion-Referenced Tests
• Norm-referenced • Criterion-referenced
– General ability – Mastery
– Range of ability – Basic skills
– Large groups – Prerequisites
– Compares people to – Affective
people-comparison – Psychomotor
groups – Grouping for
– Selecting top instruction
candidates
What do Test Scores Mean?
Basic Concepts
• Standardized test: Tests given under uniform conditions and scored and
reported according to uniform procedures. Items and instructions have been
tried out and administered to norming sample group
• Norming sample: large sample of students serving as a comparison group
for scoring standardized tests
• Frequency distributions: record showing how many scores fall into set
groups, listing number of people who obtained particular scores
• Central tendency: Typical score for a group of scores. Three measures:
– Mean-average
– Median-middle score
– Mode/bimodal (two modes)-most frequent
• Variability: Degree of difference or deviation from the mean
– Range: difference between the highest and lowest score
– Standard deviation: measure of how widely the scores vary from the
mean-further from the mean, greater SD
– Normal Distribution: Bell shaped curve is an example-Figure 39.2, p. 509
Frequency Distribution
Histogram(Bar graph of a frequency
distribution)
2
Students
1
0
40 45 50 55 60 65 70 75 80 85 90 95 100
Scores on Test
Calculating the Standard Deviation
Aptitude Tests
• Measure abilities developed over years
• Used to predict future performance
• SAT/PSAT
• ACT/SCAT
• IQ and aptitude
• Discussing test scores with families
• Controversy continues over fairness, validity, biasness
Issues in Testing
• Widespread testing (see Table 14.3, p. 534)
• Accountability and high stakes testing-misuses, Table 40.3, p. 526
• Testing teachers-accountability of student performance as well as
teacher knowledge in teacher tests See Point/Counterpoint, p. 525
Summative Assessments
• Occurs at the end of instruction
• Provides a summary of accomplishments
• End of chapter, midterms, final exam
• Purpose is to determine final achievement
Planning for Testing
• Test frequently
• Test soon after learning
• Use cumulative questions
• Preview ready-made tests
Objective Testing
• Objective: not open to many interpretations
• Measures a broad range of material
• Multiple choice most versatile
• Lower and higher level items
• Difficult to write well
• Easy to score
Key Principles: Writing Multiple Choice Questions
• Clearly written stem
• Present a single problem
• Avoid unessential details
• State the problem in positive terms
• Use “not,” “no,” or “except” sparingly or mark them: NOT , no, except
• Do not test extremely fine discriminations
• Put most wording in the stem
• Check for grammatical match between stem and alternatives
• Avoid exclusive and inclusive words all, every, only, never, none
• Avoid two distracters with the same meaning
• Avoid exact textbook language
• Avoid overuse of all or none of the above
• Use plausible distracters
• Vary the position of the correct answer
• Vary the length of correct answers— long answers are often correct
• Avoid obvious patterns in the position of your correct answer
Essay Testing
• Requires students to create an answer
• Most difficult part is judging quality of answers
• Writing good, clear questions can be challenging
• Essay tests focus on less material
• Require a clear and precise task
• Indicate the elements to be covered
• Allow ample time for students to answer
• Should be limited to complex learning objectives
• Should include only a few questions
Evaluating Essays: Dangers
• Problems with subjective testing
– Individual standards of the grader
– Unreliability of scoring procedures
– Bias: wordy essays, neatly written with few grammatical
errors often get more points and may completely off point