Download as pdf or txt
Download as pdf or txt
You are on page 1of 60

Handbook

For

Trainers of Learning Module

On

Test Item Construction Techniques

DRAFT

June 2011

Facilitators:
Rana Riaz Saeed, Senior Manager-NTS
Murtaza Noor, Project Manager-HEC
TABLE OF CONTENTS

1. BACKGROUND ................................................................................................................ 4
1.2. Objectives ......................................................................................................................... 4
1.3. Training Module ................................................................................................................ 4
1.4. Aims .................................................................................................................................. 5
1.5. Objectives of the Module .................................................................................................. 5
1.6. Brief Introduction of Module Development Experts.......................................................... 5

2. ACTIVITY SCHEDULE OF WORKSHOP ....................................................................... 6

Day-1:
3. INTRODUCTION TO TESTING, ASSESSMENT AND EVALUATION .......................... 7
SESSION-1:
Introduction of Testing, Assessment and Evaluation in Semester System ..................... 7
SESSION 2: Types of Tests and Assessment
Types of Educational Tests .............................................................................................. 7
Group Work: Types of Educational Assessments ............................................................ 8

SESSION 3:
4. PLANNING THE TEST..................................................................................................... 9
Instructional Objectives and Course Outcomes ............................................................... 9
Planning the Test .............................................................................................................. 9
The Contents..................................................................................................................... 9
Analysis of Curriculum and Textbooks ............................................................................. 9
Judgments of Experts ....................................................................................................... 9
Objectives of the Test ..................................................................................................... 10
Guidelines for Writing Objectives of the Test ................................................................. 10
Bloom’s Taxonomy of Educational Objective ................................................................. 10
Illustrative Examples ....................................................................................................... 11
Preparing the Test Blue Print ......................................................................................... 12
Preparing an Outline of Test Contents ........................................................................... 12
Techniques of Determining Test Contents ..................................................................... 12
Table of Specifications.................................................................................................... 15
Test Length ..................................................................................................................... 15
General Principles of Writing Questions for an Achievement test ................................. 16

Day-2
Session-1: Presentation of Home Assignment

Session- 2&3
5. TYPES AND TECHNIQUES OF TEST ITEM WRITING ............................................... 18
What is a Test? ............................................................................................................... 18
Achievement Test ........................................................................................................... 18
Preparing the test according to Plan .............................................................................. 19
Items commonly used for Tests of Achievement ........................................................... 19
Constructing Objective Test Items ................................................................................. 19
Alternative Response Items ............................................................................................ 20
Uses of True-False Items ............................................................................................... 20
Suggestions for Constructing True-False Items ............................................................. 21
Short-Answer / Completion Items ................................................................................... 21
Constructing Short-Answer Items ................................................................................... 22
Multiple Choice Questions .............................................................................................. 22
Characteristics of Multiple Choice Questions................................................................. 23
Desired Characteristics of items ..................................................................................... 23
Advantages of MCQ........................................................................................................ 23
Limitations of MCQ ......................................................................................................... 23
Rules for Constructing Multiple - Choice Items .............................................................. 23
A variety of multiple choice items ................................................................................... 24
Context Dependent Items ............................................................................................... 25

Day-3
6. Step 2: PREPARING THE TEST ACCORDING TO PLAN .......................................... 26
Step 3: Test Administration and Use .............................................................................. 27
Awarding Grades .......................................................................................................... 28

7. HOW TO PREPARE EFFECTIVE ESSAY QUESTIONS? ........................................... 30


How to Prepare Better Essay Questions? ...................................................................... 30
When to use Essay Questions? ..................................................................................... 34
Guidelines for Constructing Essay Questions ................................................................ 36

Day 4
Session 1
8. EVALUATION OF ITEMS .............................................................................................. 43
Item Analysis ................................................................................................................... 43
Item Difficulty................................................................................................................... 43
Item Discrimination ......................................................................................................... 43
Interpreting Distracter Values ......................................................................................... 45
Effectiveness of Distracters ............................................................................................ 46
Session 2: Item Analysis and Subsequent Decision ................................................ 46
Decisions Subsequent to Item Analysis ......................................................................... 47
Session 3: Desirable Qualities of a Test .................................................................... 47
Reliability ......................................................................................................................... 47
How to Estimate Reliability? ........................................................................................... 48
Validity ............................................................................................................................. 49
Practicality ....................................................................................................................... 50
Objectivity........................................................................................................................ 50

9. PROFESSIONAL ETHICS IN EDUCATIONAL ASSESSMENT .................................. 51

APPENDICES
Appendix-A: Activity material for flawed items
Appendix-B: Table of Specifications / Test Blueprint
Appendix-C: Cognitive Domain, Instructional Objectives and item example
Appendix-D: Professional Ethics in Assessment (Chapter Reading)
Appendix-E: PowerPoint Presentations
Training Module on “Effective Test Item Construction Techniques”

Handbook for Trainers of Learning Module on Test Item


Construction
Background

National Testing Service (NTS) is an autonomous, self-reliant, self-sustained and Pakistan’s


first premier testing service organization in public sector established with an aim to build
and promote standards in educational and professional testing & assessment. NTS
assesses competency of candidates for admission, scholarship, recruitment and promotion.
It ensures efficiency, reliability, accuracy and most significantly credibility of entire system in
a transparent manner under strict security arrangements. Moreover, NTS also contributes in
human resource development (HRD) by organizing training and capacity building sessions
and doing research and development (R&D).

The Higher Education Commission (HEC) is the primary regulator of higher education in
Pakistan. Its main purpose is to upgrade the Universities of Pakistan to be centres of
education, research and development. It facilitates development of higher educational
system in Pakistan. The HEC is also playing a leading role towards building a knowledge
based economy in Pakistan by giving out hundreds of doctoral scholarships for education
abroad every year.

In Pakistan there exists several educational assessment and evaluation systems including
provincial, federal or/and divisional Secondary and Higher Secondary Boards of examination
and universities' annual or semester exams. Generally, the marking and evaluation persons
either have little or no training/orientation of test item development and marking techniques.
There is no proper institute to impart training on an internationally recognized and
standardized testing / examination paper construction and marking techniques. Keeping in
view the outlined background, NTS is planning to conduct a series of workshops for teaching
faculties of Public Sector universities in collaboration with HEC.

Objectives

The overall objective of the workshops is to acquaint university faculty members with pre-
steps in student's evaluation in semester system and effective test item constructions
techniques.

The main objectives of the workshops are:


 To acquaint participants with test and evaluation paper development techniques in
specified subject.
 To enable participants with knowledge, skills and techniques of efficient paper
marking.

Training Module

In order to initiate the training workshop, NTS and HEC developed a training module on
“Effective Test Item Construction Techniques” by engaging educational/ psychometric
experts from various faculties of HEC affiliated universities/ educational institutions in a two-
day consultative meeting held on 17 – 18 January 2011 at NTS Secretariat Islamabad.

This training module would be used for conducting trainings for target university teachers
both from natural and social sciences in all major cities of Pakistan.

Page 1
Training Module on “Effective Test Item Construction Techniques”

Aims

This module is designed to provide an orientation to the University teachers about the goals,
concepts, principles and concerns of testing, assessment and evaluation. It will also provide
teachers an opportunity to design and construct useful test items for assessment and
evaluation in their respective subjects for undergraduates/post graduate classes.

Objectives of the Module

The main objectives of this module are to:

1. Align assessment procedures to the learning objectives of the course.


2. Give a basic understanding and training to the faculty to develop and adapt materials
for assessing the level of learning: knowledge, comprehension and application within
the course contents.
3. Follow standard assessment strategies and techniques in the construction of
educational tests.
4. Test scoring and grading schemes.
5. Pre test, Metric Analysis, Review and Modification of test items.

In addition, HECs’ selected social sciences departments under the project titled
“Development/Strengthening of Selected Departments of Social Sciences & Humanities”
would be part and parcel of this initiative as one of the key objective of the project includes
linkages with other professional organizations.

Page 2
Training Module on “Effective Test Item Construction Techniques”

Brief Introduction of the Module Development Experts


Professor Dr. Iftikhar N. Hassan has a master’s degree from Stanford University California
and a Ph.D. degree from Indiana University USA in Clinical Psychology. She has worked as
Professor and Dean at Allama Iqbal Open University and Pro-Vice Chancellor at Fatima
Jinnah University. Her last assignment was as consultant and chair department of
Behavioral Sciences at Karakuram International University, Gilgit- Baltistan.

She has been a researcher and writer in Psychology, Social and Gender Issues. Her
research publications are more than hundred and she is also author of six books. She is
member of many National and International Professional Organizations.

Her Publications include “Psychology of Women” a text book for MA students, Psychological
profile of rural women, Case studies of successful women and Voiceless Melodies etc.
Currently she is working as Executive Director, Gender and Psychological services an NGO.

Dr. Iffat S. Dar got a PhD from the University of Minnesota –USA and with an emphasis on
Psychological assessment and her master degree from Columbia University, Network-USA.
Her last regular assignment was as Chief Psychologist at Federal Public Service
Commission (FPSC), Islamabad. Since her retirement she has been working as consultant
with several national and international organizations like Ministry of Education Islamabad,
World Bank, Asian Development Bank, UNESCO and UNICEF. She has been a visiting
professor at various universities of Pakistan like QAU, FJWU and Foundation University.

Prof. Dr Rukhsana Kausar did her PhD from Surrey University, UK and Post-Doc from St.
Georges' Hospital London, US as a Commonwealth Fellow. She has been working at the
Department of Applied Psychology, University of the Punjab, for the last 23 years in various
academic and administrative capacities. Currently she is serving 2nd tenure of
Chairpersonship of Department of Applied Psychology. Dr. Rukhsana has worked for two
years at Fatima Jinnah Women University, Rawalpindi as chairperson, and Associate
Dean. She has been an active researcher and has supervised numerous M. Sc. M. Phil and
PhD research theses. Her research work has been presented extensively in international
conferences and she has about 45 research articles published in Journals of national and
international repute. She is member of various international professional bodies in the
discipline such as BPS, APA, IMSA, EHPS, ICP, IAAP and ECP.

Dr. Mah Nazir Riaz is Professor of Psychology and Dean of Social Sciences, Frontier
Women University Peshawar, Pakistan. Dr. Riaz received her Doctorate in Psychometrics
from University of Peshawar, NWFP, Pakistan (1979). Her academic career spans 40 years
of University teaching. She joined University of Peshawar on October 1 st 1968 as a lecturer
and retired as a Professor of Psychology in December 2003 from the same University. She
won University Gold Medal (1966) and President of Pakistan’s Award at completion of
Masters’ studies (1966) as well as a Professor of Psychology for her outstanding academic
achievements (2003). She won Star Women International Award (1996). She received
Distinguished Professor Award for meritorious services from Ministry of Education Govt.
of NWFP (2003). She also served as Professor of Psychology at National Institute of
Psychology, Center of Excellence, Quaid-i-Azam University Islamabad, for three years
(1999-2002). During this period, she won President of Pakistan’s Award “Izaz-e-Kamal”
(Gold Medal & cash Prize) for her life time achievements. She joined Frontier Women
University Peshawar as Dean of Social Sciences in 2006. She was nominated as Eminent
Educationist and Researcher by Higher Education Commission, Islamabad (2006).

She has published more than 60 research papers in national and international journals.
Besides, she is author of three Textbooks (1) Psychology, 2005; (2) Areas of Psychology

Page 3
Training Module on “Effective Test Item Construction Techniques”

2007; (3) Test Construction: Development and Standardization of Psychological Tests in


Pakistan 2008. She has also contributed chapters in Edited Books published in Pakistan and
USA.

She is a member of International Society for Interpersonal Acceptance and Rejection


(ISIPAR) and is currently representative of ISIPAR for South Asia. During the last two
decades, she has conducted several researches on parental acceptance-rejection,
published in national and international journals including cross-cultural research with
Professor Rohner and Professor Abdul Khaleque, University of Connecticut, USA.

Dr. Iftikhar Ahmad is currently working as Associate Professor in the University of


Management and Technology, Lahore. He served as Psychologist Federal Public Service
Commission and later as Director Research Punjab Public service commission. He also
worked as Coordinator IBA Testing Service, Karachi for some time. He retired last year from
G C University Lahore. His area of interest is Educational and Organizational Psychology
and Testing and Mental Measurement.

Page 4
Training Module on “Effective Test Item Construction Techniques”

Activity Schedule of Workshop


Estimated participants (25-30)
Day Session Activity/Contents Duration Formatted Table
(Hrs)
1 Opening Ceremony 0.5
1 Introduction to Testing, Assessment & Evaluation 1.5
 Rationale to Assessment and Evaluation in
Semester System
 Issues and problems (Activity/Discussion)
Tea/Refreshment Break 0.25
2 Types of Tests and Assessment 1.5
Types of Educational Tests
 Achievement
 Diagnostic
 Aptitude
Types of Assessment
 Formative-Summative
 Theory-practical
Lunch and Prayer Break 0.75
3 Planning the Test 2
 Key Concepts and Contents
 Objectives of Test
 Bloom’s Taxonomy of Educational Objectives
 Table of Specification
 General Principles of Writing Questions
Preparing a Table of Specification (Take Home
Activity)
2 1 Presentations of Group Activity 1.5
Tea/Refreshment Break 0.25
2 Types and Techniques of Test Item Writing 1.5
 What is a Test?
Lunch and Prayer Break 0.75
3 Practicum (preparing variety of items) 1.5
3 1 Preparing the Test According to Plan 1.0
 Alternative Response Items
 Use of True False Items
 Suggestions for Constructing True-False Items
 Multiple Choice Questions
 Characteristics of Multiple Choice Questions
Tea/Refreshment Break 0.25
2 Activity: Preparing a Test, Peer Review, Scoring 1.0
Key
 Presentation: Q&A
Lunch and Prayer Break 0.75
3 Step-2 preparing the Test According to Plan 1.5
 Item Development.
 Test Instructions
Step-3 Test Administration and Use
 Administration
 Scoring the Test
 Results
 Awarding Grades
 Review Test Results
How to Prepare Effective Essay Questions?
 How to Prepare Better Essay Questions?
 When to Use Essay Questions?
 Guidelines for Constructing Essay Questions?

Page 5
Training Module on “Effective Test Item Construction Techniques”

Activity: Interactive and discussion 1.5 Formatted Table


4 1 Evaluation of Items 1.5
 Item Analysis
 Discrimination By Difficulty
Tea/Refreshment Break 0.25
2 Activity 1.0
Item Analysis and Subsequent Decision
Desirable Qualities of a Test .75
 Reliability
 Validity
 Practicality
 Objectivity
Lunch and Prayer Break 1.0
3 Professional Ethics in Educational Testing 0.5
Feedback and Course Evaluation 0.25
Closing Ceremony 0.25

Page 6
Training Module on “Effective Test Item Construction Techniques”

Introduction to Assessment and Evaluation in Semester System

Day 1
Session 1: Rationale to Assessment and Evaluation in Semester System

The session will begin with brief introduction of the participant. The trainer will introduce
herself and then ask the participants to introduce themselves.

Basic Concept: Lecture by the Trainer

Trainer will talk about the importance of well constructed examinations as exams are the
goal posts which act as guide and motivators for students to learn. We all know from our
own experiences how students prepare for the examinations. They not only learn what
interests them the most or are presented in a better way but also what type of paper they
expect from the teacher. Due to this factor a well prepared examination paper is a guarantee
of effective teaching learning process.

Examinations have undergone radical change in the past fifty years due to improvements in
measurement techniques and better understanding of learning processes. From a lengthy
three hours essay type examination one can asses more comprehensively in thirty minutes
objective type paper which can assess not only the knowledge but also comprehension and
application of knowledge. Additionally a well prepared paper can evaluate the students
objectively and quickly and large number of students in a class is not a problem.

Why should we change our traditional system to newer system? Obviously the volume of
knowledge has increased so much that our youth need to learn larger number of subjects
for the same degree that educational institution started with semester system and now most
of the universities are going on to quarter system. The students have to cover at least one
year courses in one semester. This has become necessary to meet the world standard of
education. Additionally, there is a concept of continuous assessment which requires more
than one examination in one semester as well as class assignments and projects to asses
different types of learning e.g. ability to express in writing, ability to collect data and draw
conclusions from empirical information etc.

Issues and Problems (Practical Work/Discussion –Time: one hour)


Participants will write the advantages and disadvantages of objective type examination and
Essay Type examination. This will be followed by sharing these points and discussing both
pro and cons.

Session 2: Types of Tests and Assessment

This session is based on Blooms Taxonomy and can one devise the techniques to assess
different types of abilities.

Types of Educational Tests

There are different types of tests which are used by the educationists and psychologists. In
fact psychologists have influenced the evaluation system more than one likes to give them
credit for .One can find large number of tests on all types of abilities and aptitudes and
abnormalities in the catalogues however we are going to talk about those tests which
concern with class room achievements only. These are also called scholastic achievement
tests. We are going to talk about the following three types of tests:

Page 7
Training Module on “Effective Test Item Construction Techniques”

1. Class Room Achievement Tests


These are the tests with which every teacher is familiar and has to construct it to judge the
achievement of his /her students. If the test is well written and covers the entire course it will
be a better measure of students achievement and could discriminate between good and
poor learner. Learning needs to be understood in term of Blooms taxonomy and should not
be just rote memorization.

2. Diagnostic Achievement Test


A good achievement test can easily identify those students who have not comprehended a
particular concept and teacher can work with those students separately. In large classes with
students coming from variety of backgrounds it is very essential that teacher knows with the
help of tools which of the students need more help. Generally one accomplishes this through
quizzes and short tests after covering a certain portion of subject matter.

3. Scholastic Aptitude Tests (SAT)


Scholastic Aptitude is sort of an ability test which can predict whether or not this student has
the ability to succeed in the class room. Most of these are paper and pencil group tests and
are given at the time of admission to screen the students. These are not intelligence tests in
the classical sense which is a much wider concept than scholastic aptitude test. A very good
example is GRE or GRE Subject etc.

Reference Material
Thorndike book on Educational Measurement is a bible of test
development,
Guilford’s book on statistic and more modern books on test
development should be read by all the participants and be provided
as references during the workshop.

Group Work (30 Min)

Types of Educational Assessments


a. Formative- Summative.
b. Theory - Practical

The participants will discuss different types of assessments in an interactive session and will
be divided in four groups each assigned one type of assessment and present their report.

Page 8
Training Module on “Effective Test Item Construction Techniques”

Planning the Test

Session 3: Planning the Test


Instructional Objectives and Course Outcomes

Session Learning Outcomes

The learning outcome for this session is to help the participants to:
 Conceive of Instructional objectives and course outcomes
 Design table of specifications in accordance with instructional objectives

Key Concepts and Content

 Defining Instructional Objectives


 Designing Table of Specification

Education is a process that helps the students change in many ways. Some aspects of the
change are intentional whereas others are quite unintentional. Keeping in view this
assumption, one of the important tasks of university teachers is to decide, as far as possible
how they want their students to change and to determine their role in this process of change.
Secondly, upon the completion of a particular unit/ course, the teachers need to determine
whether their students have changed in the desired manner. Furthermore, they have to
assess the unintentional outcomes as well.

Planning the Test


Test planning includes several activities that need to be carried out by the teacher to devise
a new test. As a first step, the teacher must draw up a test “blue print”, specifying: 1) the
content and 2) objectives of the test; types of items; practice exercises; time limit etc.

1) The Content

Analysis of Curriculum and Textbooks


The outline of a classroom test should include details of test content in the specific course.
Moreover, each content area should be weighted roughly in proportion to its judged
importance. Usually, the weights are assigned according to the relative emphasis placed
upon each topic in the textbook. The median number of pages on a given topic in the
prescribed books is usually considered as an index of its importance.

Analysis of curriculum prescribed by various Boards of Intermediate and Secondary


Education (BISE), Public Sector and Private Universities of Pakistan, provides another
method of determining the subject-matter content and instructional objectives to be
measured by the proposed test.

Judgments of Experts
To devise a classroom tests, the advice and assistance of subject-matter experts, serving as
consultants, can prove to be of immense importance. The best way to seek consultants’
judgments is to submit to them a tentative outline prepared after studying the instructional
objectives as stated in representative curriculums and of the subject-matter content as
indicated by up-to-date and widely used textbooks.

Page 9
Training Module on “Effective Test Item Construction Techniques”

2) Objectives of the test

Objectives
The basic objective of an educational achievement test is to assess the desired changes
brought about by the teaching-learning process. Obviously, each subject demands a
different set of instructional objectives. For example, major objectives of the subjects like
sciences, social sciences, and mathematics are: knowledge, understanding, application and
skill. On the other hand the major objectives of a language course are: knowledge,
comprehension and expression. Knowledge objective is considered to be the lowest level of
learning whereas understanding, application of knowledge in sciences / behavior sciences is
considered higher levels of learning.

As the basic objectives of education are concerned with the modification of human behavior,
the teacher must determine measurable cognitive outcomes of instruction at the beginning of
the course. The evaluation process determines the extent to which the objectives have been
attained, both for the individual students and for the class as a whole. Such an evaluation
provides feedback that can suggest modification of either the objectives or the instruction or
both.

Some objectives are stated as broad, general, long-range goals, e.g., ability to exercise the
mental functions of reasoning, imagination, critical appreciation. These educational
objectives are too general to be measured by classroom tests and need to be operationally
defined by the class teacher.

Guidelines for Writing Objectives of Test


When writing objectives, the teacher may follow the following guidelines:
1. Begin each objective with an action verb to specify observable behavior, for
example, identify, formulate, describe etc.
2. Each objective must be stated in terms of student performance as an outcome or
learning product.
3. Each objective should be stated clearly without creating any ambiguity.

Bloom’s Taxonomy of Educational Objectives


Bloom’s Taxonomy classifies instructional objectives into three major domains-cognitive,
affective, and psychomotor. The largest proportion of educational objectives falls into the
cognitive domain.

 The Cognitive Domain is the core of curriculum and test development. It is largely
concerned with descriptions of student behavior in terms of knowledge,
understanding, and abilities that can be demonstrated.

 The Affective Domain includes objectives that emphasize interests, attitudes, and
values and the development of appreciations.

 The Psychomotor Domain is concerned with physical, motor, or manipulative skills.


 Bloom’s taxonomy classifies behaviors included in cognitive domain into the following
categories:
1. knowledge
2. comprehension
3. application
4. analysis
5. synthesis
6. evaluation

Page 10
Training Module on “Effective Test Item Construction Techniques”

 Knowledge level
Task at the knowledge level involves the psychological processes of remembering.
Items in knowledge category involve the ability to recall important information like
knowledge of specific facts, definitions of important terms, familiarity with important
concepts etc. Thus knowledge level questions are formulated to assess previously
learned information:

 Comprehension Level
The common cognitive processes required at comprehension level are translation,
interpretation, and extrapolation.

 Application Level
Task at the application level requires use of previously learned information in new Formatted: Indent: Left: 0.5"
and concrete situations to solve a problem. It requires mastery of a concept well
enough to recognize when and how to use it correctly in an unfamiliar or novel
situation. The fact that most of what we learn is intended to be applied to problem
situations in our everyday life demonstrates well the importance of application
objectives in the curriculum.

The taxonomy levels of knowledge, comprehension, and application are considered Formatted: Indent: Left: 0.5"
more valuable for curriculum development and educational evaluation than analysis,
synthesis, and evaluation. Furthermore, the taxonomy does not suggest that all good
tests or evaluation techniques must include items from every level of taxonomy.

Illustrative Examples

Knowledge Level Formatted: Underline

Which of the following does not belong with the others? Formatted: Indent: Left: 0.25"
a. Aptitude tests Formatted: Indent: Left: 0.5"
b. Personality tests
c. Intelligence tests
d. Achievement tests
Formatted: Indent: Left: 0.25"
Which of the following measures involve nominal data?
a. the test score on an examination Formatted: Indent: Left: 0.5"
b. the number on a basketball player’s jersey
c. the speed of an automobile
d. the class rank of a college student
Formatted: Indent: Left: 0.25"

Comprehension Level Formatted: Underline

A psychologist monitors a group of nursery-school children, recording each Formatted: Indent: Left: 0.25"
instance of altruistic behavior as it occurs. The psychologist is using:
a. case studies Formatted: Indent: Left: 0.5"
b. The experimental method.
c. naturalistic observation
d. The survey method.
Formatted: Indent: Left: 0.25"
In a study of the effect of a new teaching technique on students’ achievement test
scores, an important extraneous variable would be the students:
a. Hair color. Formatted: Indent: Left: 0.5"
b. athletic skills

Page 11
Training Module on “Effective Test Item Construction Techniques”

c. IQ scores.
d. sociability

Application Level Formatted: Underline

The results of Milgram’s (1963) study imply that: Formatted: Indent: Left: 0.25"
a. In the real world, most people will refuse to follow orders to inflict harm on a Formatted: Indent: Left: 0.5", Tab stops:
stranger. -3.69", List tab + Not at -3.94"
b. Many people will obey an authority figure even if innocent people get hurt.
c. Most people are willing to give obviously wrong answers when ordered to do so.
d. Most people stick to their own judgment, even when group members
unanimously disagree.
Formatted: Indent: Left: 0.25"
What is your chance of flipping 4 head or 4 tails in a row with a fair coin (i.e., one
that comes up heads 50% of the time)?
a. .0625 Formatted: Indent: Left: 0.75", Tab stops:
b. .125 -3.63", List tab + Not at -3.88"
c. .250
d. .375
Formatted: Indent: Left: 0.25"
Problem Solving
Problem solving refers to active efforts to discover an underlying process leading to
achievement of a goal, for example, the series completion, analogies, and transformation
problems.

Examples1
1. A teacher had 28 students in her class. All but 7 of them went on a museum trip and Formatted: Indent: Left: 0.25"
thus were away for the day. How many students remained in the class that day?
2. The water lilies on the surface of a small pond double in area every 24 hours. From
the time the first water lily appears until the pond is completely covered takes 60
days. On what day is half of the pond covered with lilies?
Formatted: Indent: Left: 0.25"
In the following illustrations, we have used only three levels: knowledge
(recall/recognition), comprehension (understanding) and application (or skills), and
labeled the columns accordingly.

Preparing the Test Blue Print

The blue print is meant to ensure content validity of the test. It is the most important
characteristic of an achievement test devised to determine the GPA at the end of a unit/ term
or course of instruction. The test may be based on several lessons or chapters in a text
book, reflecting a balance between content areas and learning objectives. The test blue-print
must specify both the content and process objectives in proportion to their relative
importance and emphasis in the curriculum.

Depending on the purpose of the test and the instructional objectives, the test may vary in
length, difficulty, and format (objective, essay, short-answer, open-book, or take-home).

1
Source: Sternberg, R.J.(1986). Intelligence Applied: Understanding and increasing your intellectual skills.
New York: Harcourt Brace Jovanovich.

Page 12
Training Module on “Effective Test Item Construction Techniques”

Table 1
Test Blue Print/ General Layout of an Achievement Test
Purpose of the Test Minimum competency, mastery, diagnosis, selection.
Nature of the test Norm-referenced or criterion-referenced.
Target Population School children, college or university students, trainees of a
course, and employees of an organization.
Format of Items Objective type (multiple-choice, true-false, matching,
completion), short answer, essay type, computer-administered.
Test Length Approximate number of items comprising each category
(objective type/essay type, short answers).
Testing Time Maximum time limit.
Mode of Test Individual, group, computer.
Administration
Examiners’ Qualifications, training, experience.
Characteristics
Test content Verbal, pictorial, performance.
Sources of Test Content Textbooks, subject experts, curriculum
No. of items representing Depends on the relative importance of content areas
various content strata
Appropriate difficulty Difficulty level in relation to purpose of the test, such as,
minimum competency, mastery, selection, diagnosis, etc.
Taxonomy level of the Knowledge, comprehension, application, analysis, synthesis,
items evaluation.
Scoring procedure Hand scoring, machine scoring, computer assisted scoring, or
grading (as in essay examinations).
Interpretation Norms, percentiles, grade equivalents, etc.
Item analysis Qualitative, quantitative, both.
Reliability techniques Test-retest, parallel form, Kuder-Richardson, Examiner or inter-
rater.
Validity Content, criterion-related (predictive/concurrent), construct.
Source: Riaz, M.N. (2008). Test Construction: Development and Standardization of Psychological Tests
in Pakistan. Islamabad: HEC.

Preparing an Outline of Test Contents

The term test content refers to representative sample of the course content, skills and
cognitive levels/instructional objectives to be measured by the test. The author of the test
has to prepare a test plan or table of specification that clearly shows the relative emphasis
on various topics and different types of behavior.

Techniques of Determining Test Contents

Analysis of Instructional Objectives


The following tables of specification show how to develop a relatively broad outline of a
classroom test.

Our first illustration relates to a Pre Medical Science Achievement Test. We may prepare a
100 items test according to the following table of specifications showing instructional
objectives and content areas.

Page 13
Training Module on “Effective Test Item Construction Techniques”

Table 2
Specifications related to Instructional Objectives and Content of a Premedical Science
Achievement Test
Percent of
1. Objectives of instruction Formatted: Font: 9 pt
Items
a. Recall of basic concepts 30 Formatted Table
b. Comprehension, interpretation, analysis of scientific Content 40 Formatted: Font: 9 pt
c. Application of concepts, principles etc. 30
Formatted: Font: 9 pt
100
Percent of Formatted: Font: 9 pt
2. Content areas
Items Formatted: Font: 9 pt
a. Biology 40
Formatted: Font: 9 pt
b. Chemistry 40
c. Physics 20 Formatted: Font: 9 pt
100 Formatted: Font: 9 pt
Source: Riaz, M.N. (2008). Test Construction: Development and Standardization of Psychological Tests
Formatted: Font: 9 pt
in Pakistan. Islamabad: HEC.
Formatted: Font: 9 pt
In practice, a much more detailed outline of the contents within each cell of a table is needed
before test construction proceeds. Combined in a two-way table, the above specifications
are presented in the following table (Table 3).

Table 3
Number of Items in Each Category of a Premedical Science Achievement Test
Objectives of Instruction
Content/ Formatted: Font: 9 pt
Subjects Recall of Basic Application of Concepts,
Comprehension Total
Concepts Principles, etc. Formatted Table
Biology 12 16 12 40 Formatted: Font: 9 pt
Chemistry 12 16 12 40
Formatted: Font: 9 pt
Physics 6 8 6 20
Total 30 40 30 100 Formatted: Font: 9 pt
Source: Riaz, M.N. (2008). Test Construction: Development and Standardization of Psychological Formatted: Font: 9 pt
Tests in Pakistan. Islamabad: HEC.
Formatted: Font: 9 pt
Table 4
Number of items in each category of an Achievement Test “Principles of
Psychological Measurement”
Content/ Subject-matter Objectives of Instruction Formatted: Font: 10 pt
Recall of Comprehension, Analysis, Total Formatted Table
Basic Concepts Application synthesis
evaluation
1. Basic statistical concepts;
1 2 2
variability, correlation, prediction 5 (10%)
2.
Scales, transformation, norms 3 2 0
5 (10%)
3 Reliability: concepts, theory and
3 3 4
methods of estimation 10 (20%)
4. Validity: Content, construct,
4 6 5
criterion-related validity 15 (30%)
5. Item analysis: Item characteristics,
distracter analysis, item
4 7 4
discrimination, item characteristic 15 (30%)
curves
Total 15 (30%) 20 (40%) 15 (30%) 50
Source: Riaz, M.N. (2008). Test Construction: Development and Standardization of Psychological Tests in
Pakistan. Islamabad: HEC.

Page 14
Training Module on “Effective Test Item Construction Techniques”

The above table shows a test outline for an Achievement Test based on “Principles of
Psychological measurement” (Part II: Chapters 4-10 of Psychological Testing by Murphy,
K. R., & Davidshofer, C. O., 1998).

Table of Specifications
A table of specifications is a two-way table that represents along one axis the content
area/topics that the teacher has taught during the specified period and the cognitive level at
which it is to be measured, along the other axis. In other words, the table of specifications
highlights how much emphasis is to be given to each objective or topic.

Table 5
A classroom test in Experimental Psychology: Semester 1
Subject/ Instructional Objectives: Bloom’s Taxonomy
Content Knowledge & Application Analysis, Synthesis Totals
comprehension & Evaluation
Topic A 10% 20% 10% 40%
Topic B 15% 15% 30% 60%
Total 25% 35% 40% 100%

While writing the test items, it may not be possible to attempt to adhere very rigorously to the
weights assigned in each cell of a table of specifications like the one presented in Table 5.
Thus, the weights indicated in the original table may need to be slightly changed during the
course of test construction, if sound reasons for such a change are encountered by the
teacher. For instance, the teacher may find it appropriate to modify the original test plan in
view of data obtained from the experimental try-out of the new test.

Preparing a Table of Specification

Table 6
Table of Specifications showing only one topic (Problem Solving) and three
levels of cognitive objectives (Knowledge, Comprehension, Application)
Objectives Number of
Content/Topics Knowledge Comprehension Application Items
Problem solving in search of 10% 10%
solutions
Barriers to Effective problem 10% 15% 25%
solving
Approaches to problem solving 15% 20% 35%
Culture, cognitive style, and 30% 30%
problem solving

The above table shows that the first topic is to be measured only at the knowledge level, and
the fourth topic at the application level. Second and third topics are to be measured at two
different levels. Topic 2: Knowledge and Comprehension; topic 3: Comprehension and
Application. Preparing a test according to the above table of specifications means that 20%
of items in our test measure Knowledge, 30% measure Comprehension, 50% measure
Application.

Test Length
The number of items that should constitute the final form of a test is determined by the
purpose of the test or its proposed uses, and by the statistical characteristics of the items.
Some of the important considerations in setting test length are:

Page 15
Training Module on “Effective Test Item Construction Techniques”

1. The optimal number of items for a homogenous test is lower than for a highly
heterogeneous test.
2. Items that are meant to assess higher thought processes like logical reasoning,
creativity, abstract thinking etc., require more time than those dependent on our
ability to recall important information.
3. Another important consideration in determining the length of test and the time
required for it is related to the validity and reliability of the test. The teacher has to
determine the number of items that will yield maximum validity and reliability of the
particular test.

General Principles of Writing Questions for an Achievement Test


Different types of questions can be devised for an achievement test, for instance, multiple
choice, fill-in-the-blank, true-false, matching, short answer and essay. Although each type of
question is constructed differently, the following principles apply to constructing questions
and tests in general:

1. Instructions for each type of question must be simple and brief.


2. Questions must be written in simple language. If the language is difficult or
ambiguous, even a student with strong language skills and good vocabulary may
answer incorrectly if his/her interpretation of the question is different from the author’s
intended meaning.
3. Test items must assess specific ability or comprehension of content developed
during the course of study.
4. Write the questions as you teach or even before you teach, so that your teaching
may be aimed at significant learning outcomes.
5. Devise questions that call for comprehension and application of knowledge skills.
6. Some of the questions must aim at appraisal of examinees’ ability to analyze,
synthesize, and evaluate novel instances of the concepts. If the instances are the
same as used in instruction, students are only being asked to recall (knowledge
level).
7. Questions should be written in different formats, e.g., multiple-choice, completion,
true-false, short answer etc. to maintain interest and motivation of the students.
8. Prepare alternate forms of the test to deter cheating and to provide for make-up
testing (if needed).
9. The items should be phrased so that the content rather than the format of
the statements will determine the answer. Sometimes the item contains “specific
determiners” which provide an irrelevant cue to the correct answer. For example,
statements that contain terms like always, never, entirely, absolutely, and exclusively
are much more likely to be false than to be true. On the other hand, such terms as
may, sometimes, as a rule, and in general are much more likely to be true. Besides,
care should be taken to avoid double negatives, complicated sentence structures,
and unusual words.
10. The difficulty level of the items should be appropriate for the ability level of the group.
Optimal difficulty for true-false items is about 75 percent, for five-option multiple-
choice questions about 60 percent, and for completion items approximately 50
percent. However, difficulty in itself is not an end, the item content should be
determined by the importance of the subject matter. It is desirable to place a few
easy items in the beginning to motivate students, particularly those who are of below
average ability.
11. The items should be devised in such a manner that different taxonomy levels are
evaluated. Besides, achievement tests should be power test, not speed test.
12. Items pertaining to a specific topic or of a particular type should be placed together in
the test. Such a grouping facilitates scoring and evaluation. It will also be helpful for
the examinees to think and answer the items, similar in content and format, in a
better manner without fluctuation of attention and changing the mind set.

Page 16
Training Module on “Effective Test Item Construction Techniques”

13. Directions to the examinees should be as simple, clear, and precise as possible, so
that even those students who are of below average ability can clearly understand
what they are expected to do.
14. Scoring procedures must be clearly defined before the test is administered.
15. The test constructor must clearly state optimal testing conditions for test
administration.
16. Item analysis should be carried out to make necessary changes, if any ambiguity is
found in the items.

Table of Specification (Take Home Activity)


Participants will be asked to work individually and prepare a table of specification for one of
the courses they are teaching.

Page 17
Training Module on “Effective Test Item Construction Techniques”

Day 2: Session: 1
a. Presentations based on home assignment.

b. Participants will share their drafts with group members for discussion and further
input.

Types and Techniques of Test Item Writing


Session 2 & 3

Objectives of the Sessions

 To develop understanding of the essential steps in test construction for ensuring /


enhancing validity and reliability.
 To enable the participant to understand distinctions between a variety of
achievement test items, their characteristics and appropriate usage of each item as
distinguished from the others.
 To apprise the participant of the techniques of writing good test items
 To enable the participants to understand the scoring procedures and the meaning of
scores.
 To develop understanding and appreciation for scientific ways of awarding and using
grades.

What is a Test?
A test is an instrument or a tool. It follows a systematic procedure for measuring a sample of
behavior by posing a set of questions in a uniform manner. It is an attempt to measure what
a person knows or can do at a particular point in time. Furthermore, a test answers the
question ‘how well’ does the individual perform either in comparison with others or in
comparison with a domain of performance tasks?

Achievement Test
A test designed to apprise what the individual has learned as a result of planned previous
experience or training is an Achievement Test. Since it relates to what has been learnt
already its frame of reference is on the present or past.

Basic Assumptions

Preparation of an achievement test assumes that the content and / or the skill domain
covered by the test can be specified in behavioral terms and that the knowledge and skill to
be measured must be specified in a manner that is readily communicable to other persons. It
is important that the test measures the important goals rather than peripheral or incidental
goals. It also assumes that the test takers have had the opportunity to learn the material
covered by the test.

Achievement tests are designed specially to measure the degree of accomplishment in


some particular educational or training experience. They are designed to measure the
knowledge and skills developed in a relatively circumscribed area (domain). This area may
be as narrow as one day’s class assignment or as broad as several years’ study.

Page 18
Training Module on “Effective Test Item Construction Techniques”

Achievement tests attempt to measure what a person knows or can do at a particular point in
time. Furthermore, our reference is usually to the past; that is we are interested in what has
been learned as a result of a particular course or experience or a series of experiences

Step 2

Preparing the Test According to Plan


The next step after planning the test is preparing it in accordance with the plan. This (step 2)
mainly deals with development of items and organizing them in the form of a test. Before we
discuss preparing the test, it seems quite reasonable that we talk about different types of test
items, their characteristics, use and limitations.

Items commonly used for Tests of Achievement


Two major types of items have been identified:
1. Constructed Response / Supply items
2. Structured Response / Select items

1. Constructed Response / Supply items


In the supply type items the question is so framed that the examinee has to supply or
construct the answer on his own in his own words.
They generally include the following type:
 Essay
 Short Answer
 Completion

2. Structured Response / Select items


In the select type items, as the name suggests the examinee is required to select the correct
answer from amongst the given or structured options. They are often called objective items.
They include:
 Alternate Response
 Multiple-choice
 Matching

The Constructed Response / Supply type items must be dealt with in another module. Let us
restrict ourselves here with the use, limitations and construction of Structured Response /
Select type items.

Constructing Objective Test Items: Simple Forms Structured Response /


Select
Construction of test items is a crucial step for the validity of a classroom test is determined
by the extent to which performance to be measured is called forth by the test items. It is not
enough to have knowledge of subject matter, defined learning outcomes, or a psychological
understanding of the students’ mental processes, although all of these are prerequisites. The
ability to construct high-quality test items requires knowledge of the principles and
techniques of test construction and skill in their application.

Objective test forms typically measure relatively simple learning outcome.

Page 19
Training Module on “Effective Test Item Construction Techniques”

Alternative Response Items


Alternative response item, by definition is the one that offers two options to choose from.
They often consists of a declarative statement that the examinee is asked to mark true or
false, right or wrong, correct or incorrect, yes or no, agree or disagree, or the like.

Incomplete sentences providing two options to choose from to fill in the blank also fall in this
category. Very common use of such items is to test the knowledge of grammar. Appropriate
use ‘tense’ and also, contextual meaning of words or spelling mainly of words that sound
alike.

In each case there are only two possible answers. The most common form it takes is True -
False questions.

Uses of True-False Items


Most common use of the true-false item is in measuring the examinee’s ability to identify the
correctness of statements of fact, definitions of terms, statements of principles, and the like,
also to distinguish fact from opinion.

True-false tests include numerous opinion statements to which the examinee is asked to
respond true or false. There is no objective basis for determining whether a statement of
opinion is true or false. In most situations, when a student is the respondent, he guesses
what opinion the teacher holds and marks the answers accordingly. This, of course, is not
desirable from all standpoints, testing, teaching, and learning. An alternative procedure is to
attribute the opinion to some source, making it possible to mark the statements true or false
with some objectivity. This would allow measuring knowledge concerning the beliefs that
may be held by an individual or the values supported by an organization or institution.

Another aspect of understanding that can be measured by the true-false item is the ability to
recognize cause-and-effect relationships. This type of item usually contains two true
propositions in one statement, and the examinee is to judge whether the relationship
between them is true or false.

The true-false item also can be used to measure some simple aspects of logic.

Criticism
A common criticism of the true-false item is that an examinee may be able to recognize a
false statement as incorrect but still not know what is correct. To overcome such difficulties,
some teachers prefer to have the students change all false statements to true.

Advantage
A major advantage of true-false items is that they are efficient.
 Students can typically respond to roughly three true-false items in the time it takes to
respond to two multiple choice items.
 True-false items have utility for measuring a broad range of verbal knowledge.
 A wide sampling of course material can be obtained.

Limitations
Limitations of the true-false items are in the types of learning outcomes that can be
measured.
 True-false items, is, unfortunately more illusory than real ease of construction
 True-false items are not especially useful beyond the knowledge area.

Page 20
Training Module on “Effective Test Item Construction Techniques”

 The exceptions to this seem to be distinguishing between fact and opinion and
identifying cause-and-effect relationships. These two outcomes measured by the
true-false item can be measured more effectively by other forms of selection items,
especially the multiple-choice form.
 Another factor that limits the usefulness of true-false item is susceptibility to
guessing.

Successful guessing on the true-false item has effects that are at least as deleterious:
1. First, the reliability of each item is low.
2. Second, Very little diagnostic value of such a test.

Another concern that needs to be considered in the design of tests with true-false items is
student response sets.
 A response set is a consistent tendency to follow a certain pattern in responding to
test items.

Note: True-false items are most useful in situations in which there are only two possible
alternatives (for instance right, left, more, less, who, whom, and so on) and special uses
such as distinguishing fact from opinion, cause from effect, superstition from scientific belief,
relevant from irrelevant information, valid conclusions, and the like.

Suggestions for Constructing True-False Items


 Avoid trivial statements.
 Avoid broad general statements.
 Avoid the use of negative statements, especially double negatives. When a negative
word must be used, it should be underlined or put in italics so that students do not
overlook it.
 Avoid complex sentences.
 Avoid including two ideas in one statement, unless cause-effect relationships are
being measured.
 Avoid using opinion that is not attributed to some sources , unless the ability to
identify opinion is being specifically measured.
 Avoid using true statements and false statements that are unequal in length.
 Avoid using disproportionate numbers of true statements and false statements.

Short-Answer / Completion Items


The short –answer item and the completion item both are supply-type test items. Yet, they
are included here for their simplicity.

They can be answered by a word, phrase, number, or symbol.

The short-answer item uses a direct question whereas the completion item consists of an
incomplete statement.
 Short-answer item is especially useful for measuring problem-solving ability in
science and mathematics.
 Complex interpretations can be made when the short- answer item is used to
measure the ability to interpret diagrams, charts, graphs, and pictorial data.

When short-answer items are used the question must be stated clearly and concisely. It
should be free from irrelevant clues, and require an answer that is both brief and definite.

Page 21
Training Module on “Effective Test Item Construction Techniques”

Advantages of Short-Answer Items


Short-answer test item is one of the easiest to construct. This is used almost exclusively to
measure the recall of memorized information.

In short-answer type item the student must supply the answer, this reduces the possibility
that the examinee will obtain the correct answer by guessing.

Limitations of Short-Answer
It is not suitable for measuring complex learning outcomes. Unless the question is carefully
phrased, many answers of varying degrees of correctness must be considered for total or
partial credit. Hence it is difficult to score.

These limitations are less troublesome when the answer is to be expressed in numbers or
symbols, as in physical science or mathematics.

Constructing Short-Answer Items


The following suggestions will help to avoid possible pitfalls and provide greater assurance
that the items will function as intended.
 Word the item so that the required answer is both brief and specific. A direct question
is generally more desirable than an incomplete statement.
 Do not take statements directly from textbooks to use as a basis for short-answer
items.
• If the answer is to be expressed in numerical units, indicate the type of answer
wanted.
• Blanks for answers should be equal in length and in a column to the right of the
question.
• Do not include too many blanks.

Multiple Choice Questions

What is a Multiple Choice Item?


The multiple choice item (MCQ) consists of two distinct parts:
1. The first part that contains task or problem is called stem of the item. The stem of the
item may be presented either as a question or as an incomplete statement. The form
makes no difference as long as it presents a clear and a specific problem to the
examinee.

2. Second part presents a series of options or alternatives. Each option represents


possible answer to the question. In a standard form one option is the correct or the best
answer called the keyed response and the others are misleads or foils called
distracters.

The number of options used differs from one test to the other. An item must have at least
three answer choices to be classified as a multiple choice item. The typical pattern is to have
four or five choices to reduce the probability of guessing the answer. A good item should
have all the presented options look like probable answers at least to those examinees who
do not know the answer.

Terminology: Multiple Choice Questions


1. Stem: presents the problem
2. Keyed Response: correct or best answer

Page 22
Training Module on “Effective Test Item Construction Techniques”

3. Distracters: appear to be reasonable answers to the examinee who does not know
the content
4. Options: include the distracters and the keyed response.

Characteristics of Multiple Choice Questions


Multiple choice items are considered better than all items that can be scored
objectively

1. The MCQ is the most flexible of the objective type items. It can be used to appraise
the achievement of any educational objectives that can be measured by a paper-and-
pencil test except those relating to skill in written expression and originality.
2. An ingenious and talented item writer can construct an MCQ to measure a variety of
educational objectives from rote learning to more complex learning outcomes like
comprehension, interpretation, application of knowledge and also those that require
the skills of analysis or synthesis to arrive at the correct answer.
3. Moreover, the chances of getting a correct response by guessing are significantly
reduced.

However, good multiple choice items are difficult to construct. A thorough grasp of the
subject matter and skillful application of certain rules is needed to construct good multiple
choice items.

Desired characteristics of items


 Desirable difficulty level
 Ability to discriminate between high and low performers
 Effective distracters

Advantages of MCQ
 Wide sampling of content
 The problem or the task is well structured or clearly defined.
 Flexible Difficulty Level
 Efficient scoring of items
 Objective scoring
 Provide scores easily understood and transformed as needed:

Multiple choice tests provide scores in matrices that are familiar to most of the score users
i.e. percentiles, grade equivalent scores

Limitations of MCQ
The multiple choice items, despite having advantages over other items, have some serious
limitations as well.
 It takes time to construct MCQ.
 susceptible to guessing
 Do not provide any diagnostic information.

Rules for Constructing Multiple - Choice Items


1. Be sure that the stem clearly formulates a problem. The stem should be worded
so that the examinee clearly understands the question being asked before he
reads the answer choices.

Page 23
Training Module on “Effective Test Item Construction Techniques”

2. Stem should be written either in direct question form or in an incomplete


statement form.

3. The stem of the item should present only one problem. Two concepts must not
be combined together to form a single stem.

4. Include as much of the item in the stem and keep options as short as possible:
this leads to economy of space, economy of reading time and clear statement of
the problem. Hence, include most of the information in the stem and avoid
repeating it in the options. For example, if an item relates to the association of a
term with its definition, it would be better to include definition in the stem and
several terms as options rather than to present option in the stem and several
definitions as alternatives.

5. Unnecessary words or phrases should not be included in the stem. Such words
add to the length and complexity of the stem but do not enhance meaningfulness
of the stem. The stem should be written in simple, concise and clear form.

6. Avoid the use of negative words in the stem of the item. There are times when it
is important for the examinee to detect errors or to know exceptions. For these
purposes, sometimes the use of ‘not ‘or ‘except’ is justified in the stem. When a
negative word is used in a stem it should be highlighted.

7. Use novel material in formulating problems to measure understanding or ability to


apply principles. Do not focus too closely on rote memory of the text that neglects
measurement of the ability to use information.

8. Use plausible distracters as alternatives. If an examinee who does not know the
correct answer is not distracted by a given alternative, that alternative is not
plausible and it will add nothing to the functioning of the item.

9. Be sure that no unintentional clues

10. The correct answer should appear at each position in almost equal numbers.
While constructing multiple - choice item, some examiners have a tendency to
place correct alternative at the first position. Some place it in the middle and
others at the end. Such tendencies should be consciously controlled.

11. Avoid using ‘none of the above’, ‘all of the above’, both a and b etc. as options
for an MCQ.

12. Alternatives should be grammatically consistent with the stem. Grammatical


inconsistency provides irrelevant clues.

A variety of multiple choice items

Matching Exercises

 Matching exercise consists of two parallel columns with each word, number, or
symbol in one column being matched to a word, sentences, or phrase in the other
column.

 Items in the column for which a match is sought are called premises, and the items in
the column from which the selection is made are called responses.

Page 24
Training Module on “Effective Test Item Construction Techniques”

Uses of Matching Exercises

 When you have a number of questions of the same type (homogeneous), it is


advisable to frame a matching item in place of a number of similar MCQs.

 Whenever learning outcomes emphasize the ability to identify the relationship


between two things and a sufficient number of homogeneous premises and
responses can be obtained, a matching exercise seems most appropriate.

 Hence it is suggested to Use only homogeneous material in single matching exercise


e. g.
 Inventions and inventers
 Authors and Books
 Scientists and their contribution

 The major advantage of the matching exercise is its compact form, which makes it
possible to measure a large amount of related factual material in a relatively short
time.

Limitations

 It is restricted to the measurement of factual information.


 Another limitation somewhat is the difficulty of finding homogeneous material that is
significant from the viewpoint of our objectives and learning outcomes.

Suggestions for Constructing Matching Exercises

 Use only homogeneous material in a single matching exercise.


 Include an unequal number of responses and premises and instruct the student that
responses may be used once, more than once, or not at all.
 Keep the list of items to be matched brief, and place the shorter responses on the
right.
 Arrange the list of responses in logical order. Place words in alphabetical order and
numbers in sequence.
 Indicate in the directions the basis for matching the responses and premises.
 Ambiguity and confusion will be avoided. And testing time will be saved.
 Place all of the items for one matching exercise on they same page.

Context Dependent Items

A variety of multiple choice items may be used to measure learning achievement of higher
level such as comprehension, interpretation, extrapolation, application, reasoning, analysis
etc. and help the students focus more on the items/test.

The most commonly used variation is the Context Dependent Item. The selection of context
or stimuli is made in accordance with the nature of the discipline /subject and the learning
outcome to be measured. The context may be in the form of a:
 Paragraph
 Diagram
 Graph

Page 25
Training Module on “Effective Test Item Construction Techniques”

 Picture

One context / stimulus may be followed by one or more multiple choice items some
examples of context dependent items are stated as under:

Paragraph as a Context
Paragraph as a context is used to measure learning outcomes relating to reading
comprehension i.e. understanding meaning/theme of the paragraph, understanding
contextual meanings of words, relating and synthesize various parts of information given in a
paragraph etc.

Diagram/Picture as a Context

 The questions using diagram may measure not only knowledge but understanding
and application as well.

 Like other contexts, a diagram may be followed by a number of MC items, it may also
require the examinee to label various specified parts of the diagram, or even ask
about their functions.

Graph as a Context
Reading and interpreting graphs is the ability that can be useful in most social and
physical sciences. MCQ be asked to assess the desired achievement in the
respective field.

Step 2: Preparing the Test According to Plan

Item Development:
Items that can be scored Objectively: True false, matching, and multiple-choice type
and their variations

In preparing objective test care should be taken to make the items clear, precise,
grammatically correct, and written in language suitable to the reading level of the group for
whom the test is intended. All information and qualifications needed to select a reasonable
answer should be included, but non-functional or stereotyped words and phrases should be
avoided.

General recommendations that apply to all kinds of test exercises:


 Keep the test plan in view as test exercises are written. Items should be addressed to
the cells in the blueprint / the test plan.
 Draft the test items some time in advance, and then review them
 Have test items examined and critiqued in, the light of the rules for writing items, by
one or more colleagues.
 Check if the item has an answer that would be agreed upon by experts. If possible
one of the experts may take the test and the responses of the expert are compared
with the keyed answers. This way any error can be detected and the test developer
may put confidence in the key thus finalized.
 Prepare a surplus of test exercises. So that an adequate number of good items will
be available for the final version of the test.

Assembling a Test Items after having written and selected they are organized in the form
of a test.

Page 26
Training Module on “Effective Test Item Construction Techniques”

Arranging Items in the test


Items of the same format may be placed together. Each item type requires specific set of
directions and a somewhat different mental set on the part of the examinee.

So far as possible, within item type, items dealing with the same content may be grouped
together. The examinee will be able to concentrate on a single domain at a time rather than
having to shift back and forth among areas of content. Furthermore, the examiner will have
an easier job of analyzing the results, as it will be easier to see at a glance whether the
errors are more frequent in one content area than the other.

Items may be so arranged that difficulty progress from easy to hard

Items should be arranged in the test booklet so that answers follow no set pattern.

Test Instructions
The directions should be simple but complete. They should indicate the purpose of the test,
the time limits and the score value of each question. Write a set of directions for each item
type that is used on the test specifying what the respondent is expected to do and how one
is required to record the responses.

Answer Sheets
Separate answer sheets, which are easier to score, can be used at high school level and
beyond.

Test Length
Make sure that the length of the test is appropriate for the time limits?

After the items have been reviewed and tentatively selected, it is important to see that the
items measure a representative sample of learning objectives and course content included in
the test plan. Agreement between the test plan and the test would ensure content validity of
the test.

Step 3: Test Administration and Use

Administration
All pupils must be given a fair chance to demonstrate their achievement. Physical and
psychological environment be conducive to their best efforts. Control all factors that might
interfere with valid measurement: Adequate workspace, quiet, proper light and ventilation
are important. Pupils must be put at ease, tension and anxiety should be reduced to the
minimum.

Scoring the test


Scoring Key

If the pupils’ answers are recorded on the test paper, the teacher may make a scoring key by
marking the correct answers on a blank copy of the test. When separate answer sheets are
used, a scoring stencil is a blank answer sheet with holes punched where correct answer
should appear. Before scoring procedure is used, each test paper should also be scanned to

Page 27
Training Module on “Effective Test Item Construction Techniques”

make sure that only one answer was marked for each item. Any item containing more than
one answer should be eliminated from scoring.

In scoring objective tests, each correct answer is usually counted as one point. When pupils
are told to answer every item on the test, a pupil's score is simply the number of items
answered correctly.
Short answer questions may sometime require awarding partial credit and may pose some
problem in scoring. However, a detailed key may be prepared in advance to avoid confusion.

For each question and for the test as a whole, the examiner may make a tally for each kind
error that the examinees make. A summary of these errors could then be used to plan
instructional activities.

Results

The raw score / score obtained by a pupil have no meaning at all and cannot be directly
interpreted. If a student obtains 75 marks out of 100, it tells us neither how s/he compares
with other students nor what he knows nor what he does not know. The simplest form of
meaning that teachers often provide to the test score is by assigning ranks to the scores. It
however, becomes more interpretable when one knows the total number of the students in
the class. Often grades are assigned to give meaning to the raw scores by comparing
individual performance with the whole group that has taken the test. In educational
institutions most often it is the class. For criterion referenced test, of course, absolute
grading is used

Awarding Grades

Relative grading
Letter grades are typically assigned on the basis of Performance in relation to other group
members. Some teachers assign them on normal curve but it is not recommended by
experts (Linn and Gronlund), for the classes are usually too small to attain normal curve.
They suggest that before letter grades are assigned, the proportion of As, Bs, Cs, Ds, and
Fs, to be used may be determined. This must be done in the light of a consistent policy of
the institution or the system.

The following distribution is recommended for an introductory course for the purpose of
illustration only.
A = 10-20 % of the students.
B = 20-30 % of the students.
C = 30-50 % of the students.
D = 10-20 % of the students.
F = 0-10 % of the students.

The educational institution should decide about a consistent grading policy for all its
departments. Grades may be awarded on the basis of percentile or standard score system
may be used.

 In relative grading, grades provide meaning to the scores in terms of performance in


reference to the group.
 When grades are assigned to the obtained scores, raw scores loose their
significance.
 In most systems where letter grades are used, grades are assigned numerical
values. Such as A=4; B=3; C=2; D=1 and F = 0 or fail.

Page 28
Training Module on “Effective Test Item Construction Techniques”

 Grade point for a course is obtained by multiplying the grade value with its credit
hours.
 Finally Grade point average (GPA: average of the grade points for all the courses) is
found.
 The GPA, a numerical value is often converted into equivalent letter grade

Assigning and communicating the grades to the class is not enough. It is important that the
teacher/ examiner returns and reviews test results with the students/ examinees. Feedback
on the performance has a special value for motivating learners for improvement. Moreover,
learning from one's mistake is usually very effective.

Absolute grading
Assigning grades on absolute basis involves comparing a pupil’s performance to pre -
specified standards set by the teacher. These standards are usually concerned with the
degree of mastery to be achieved as specified and may be specified as the percent of
correct answers to be obtained on a test designed to measure clearly defined set of
learning tasks (competencies) on a criterion referenced test.

Practice though not scientific


There are instances where pre specified standards are used to assign letter grades directly
on the basis of raw scores. For example, in Pakistan the Boards of Secondary and
Intermediate Education assign:
A1 on 80 % marks or beyond
A on 70 – 79 % marks
B on 60 – 69 % marks …
These grades at best tell us that the score obtained by a student lies between a certain
range. Like a raw score such a grade tells us neither how a pupil compares with other
students nor what he knows nor what he does not know.

The experts in the field do not recognize this system as absolute grading nor does this fall in
the category of relative grading. Though not scientifically recognized, this system of grading
is practiced in many educational settings.

Review Test Results


Assigning and communicating the grades to the class is not enough. It is important that the
teacher returns and review test results with the students. Feedback on the performance has
a special value for motivating students for improvement. Moreover, learning from one’s
mistake is usually very effective.

Page 29
Training Module on “Effective Test Item Construction Techniques”

How to Prepare Effective Essay Questions?


Objectives of the Sessions

1. To provide faculty with information and guidelines that helps better utilize the
advantages of essay questions in assessing student performance. It will also provide
guidelines for dealing with the challenges of essay questions.
2. To help understand the main advantages and limitations of essay questions and
common misconceptions associated with their use.
3. To help distinguish between learning outcomes that are appropriately assessed by
using essay questions and outcomes that are likely to be better assessed by other
means.
4. Evaluate existing essay questions using commonly accepted criteria.
5. Improve poorly written essay questions by using the information in this booklet to
identify flaws in existing questions and correct them.
6. Construct well-written essay questions that assess given objectives.

How to Prepare Better Essay Questions?


What is an Essay Question?

There are two major purposes for using essay questions that address different learning
outcomes. One purpose is to assess students' understanding of subject-matter content. The
other purpose is to assess students' writing abilities. These two purposes are so different in
nature that it is best to treat them separately.

An essay question is "…a test item which requires a response composed by the examinee,
usually in the form of one or more sentences, of a nature that no single response or pattern
of responses can be listed as correct, and the accuracy and quality of which can be judged
subjectively only by one skilled or informed in the subject."

An essay question should meet the following criteria:

1. Requires examinees to compose rather than select their response.


Multiple-choice questions, matching exercises, and true-false items are all examples of
selected response test items because they require students to select an answer from a
list of possibilities provided by the test maker, whereas essay questions require students
to construct their own answer.

2. Elicits student responses that must consist of one or more sentences.


Does the following example require student responses to consist of one or more
sentences?

Example A

How do you feel about the removal of prayer from public school system?

In Example A, it is possible for a student to answer the question in one word. For instance, a
student could write an answer like "good" or "bad". Moreover, this is a poor example for
testing purposes because there is no basis for grading students’ personal preferences and
feelings.

The following example improves upon the given example in such a way that it elicits a
response of one or more sentences that can be graded.

Page 30
Training Module on “Effective Test Item Construction Techniques”

Consider the following argument in favor of organized prayer in school.

School prayer should be allowed because national polls repeatedly indicate that the
majority of Pakistanis are in favor of school prayers. Moreover, statistics show a
steady moral decline in a country if banning of organized prayer in school. Drug use,
divorce rate, and violent crime have all increased since the banning of organized
prayer in school.

Analyze the argument by explaining which assumptions underlie the argument.

3. No single response or single response pattern is correct.

Which example question below allows for a variety of correct answers?

Example B or Example C

Example B
What was the full name of the man who assassinated President Abraham Lincoln?

Example C
State the full name of the man who assassinated President Abraham Lincoln and explain
why he committed the murder.

There is just one single correct answer to Example B because the students need to
write the full name of the man who assassinated President Abraham Lincoln. The
question assesses verbatim recall or memory and not the ability to think. For this
reason, Example B would not be considered a typical essay question. Example C
assesses students’ understanding of the assassination and it is more effective at
providing students the opportunity to think and to give a variety of answers. Answers
to this question may vary in length, structure, etc.

4. The accuracy and quality of students’ responses to essays must be judged


subjectively by a competent specialist in the subject.

The nature of essay questions is such that only competent specialists in the subject can
judge to what degree student responses to an essay are complete, accurate, correct,
and free from extraneous information. Ineffective essay questions allow students to
generalize in their responses without being specific and thoughtful about the content
matter.

Effective essay questions elicit a depth of thought from students that can only be judged
by someone with the appropriate experience and expertise in the content matter. Thus,
content expertise is essential for both writing and grading essay questions.

Which of the following sample questions prompts student responses that can only be judged
subjectively by a subject matter expert?

Example D
Explain how Arabs would treat women before advent of Islam.

Example E
As mentioned in class, list main ways women were treated before advent of Islam

Page 31
Training Module on “Effective Test Item Construction Techniques”

In order to grade a student's response to above Examples, the grader needs to know the
ways women were treated in specified period in Arab countries. It takes subject-matter
expertise to be able to grade an essay response to this question.

Review: What is an Essay Question?

An essay question is a test item which contains the following four elements:
1. Requires examinees to compose rather than select their response.
2. Elicits student responses that consist of one or more sentences.
3. No single response or single response pattern is correct.
4. The accuracy and quality of students’ responses to essays must be judged
subjectively by a competent specialist in the subject.
1.
Advantages, Limitations, and Common Misconceptions of Essay Questions

In order to use essay questions effectively, it is important to understand the following


advantages, limitations and common misconceptions of essay questions.

Advantages

1. Essay questions provide an effective way of assessing complex learning outcomes


that cannot be assessed by other commonly used paper-and-pencil assessment
procedures.

Essay questions allow you to assess students' ability to synthesize ideas, to


organize, and express ideas and to evaluate the worth of ideas. These abilities
cannot be effectively assessed directly with other paper-and-pencil test items.

2. Essay questions allow students to demonstrate their reasoning.

Essay questions not only allow students to present an answer to a question but also
to explain how they arrived at their conclusion. This allows teachers to gain insights
into a student's way of viewing and solving problems. With such insights teachers are
able to detect problems students may have with their reasoning process and help
them overcome those problems.

3. Essay questions provide authentic experience. Constructed responses are closer to


real life than selected responses.

Problem solving and decision-making are vital life competencies. In most cases,
these skills require the ability to construct a solution or decision rather than selecting
a solution or decision from a limited set of possibilities. It is not very likely that an
employer or customer will give a list of four options to choose from when he/she asks
for a problem to be solved. In most cases, a constructed response will be required.
Hence, essay items are closer to real life than selected response items because in
real life students typically construct responses, not select them.

Limitations

1. Essay questions necessitate testing a limited sample of the subject matter, thereby
reducing content validity.
2. Essay questions have limitations in reliability.
3. Essay questions require more time for scoring student responses.
4. Essay questions provide practice in poor or unpolished writing.

Page 32
Training Module on “Effective Test Item Construction Techniques”

Common Misconceptions

1. By their very nature essay questions assess higher-order thinking.

Whether or not an essay item assesses higher-order thinking depends on the design of the
question and how students’ responses are scored. An essay question does not automatically
assess higher-order thinking skills. It is possible to write essay questions that simply assess
recall. Also, if a teacher designs an essay question meant to assesses higher-order thinking
but then scores students’ responses in a way that only rewards recall ability, that teacher is
not assessing higher-order thinking.

Exercise
Compare the following two examples and decide which one assesses higher-order
thinking skills.

Example A
What are the major advantages and limitations of essay questions?

Example B
Given their advantages and limitations, should an essay question be used to assess the
following intended learning outcome? In answering this question provide brief
explanations of the major advantages and limitations of essay questions. Clearly state
whether you think an essay question should be used to assess students’ achievement of
the given intended learning outcome and explain the reasoning for your judgment.

Intended learning outcome: Evaluate the reasons why the nursing process is an effective
process for serving clients.

Example A assesses recall of factual knowledge, whereas Example B requires more of


students. It requires students to recall facts, to make an evaluative judgment, and explain
the reasoning for their judgment.

2. Essay questions are easy to construct.

Essay questions are easier to construct than multiple-choice items because teachers
don't have to create effective distracters. However, that doesn’t mean that good
essay questions are easy to construct. They may be easier to construct in a relative
sense, but they still require a lot of effort and time. Essay questions that are hastily
constructed without much thought and review usually function poorly.

3. The use of essay questions eliminates the problem of guessing.

One of the drawbacks of selected response items is that students sometimes get the
right answer by guessing which of the presented options is correct. This problem
does not exist with essay questions because students need to generate the answer
rather than identifying it from a set of options provided. At the same time, the use of
essay questions introduces bluffing, another form of guessing. Some students are
adept at using various methods of bluffing (vague generalities, padding, name-
dropping, etc.) to add credibility to an otherwise vacuous answer. Thus, the use of
essay questions changes the nature of the guessing that occurs, but does not
eliminate it.

Page 33
Training Module on “Effective Test Item Construction Techniques”

4. Essay questions benefit all students by placing emphasis on the importance of


written communication skills.

Written communication is a life competency that is required for effective and


successful performance in many vocations. Essay questions challenge students to
organize and express subject matter and problem solutions in their own words,
thereby giving them a chance to practice written communication skills that will be
helpful to them in future vocational responsibilities. At the same time, the focus on
written communication skills is also a serious disadvantage for students who have
marginal writing skills but know the subject-matter being assessed. To the degree
that students who are knowledgeable in the subject obtain low scores because of
their inability to write well, the validity of the test scores will be diminished.

5. Essay questions encourage students to prepare more thoroughly.

Some research seems to indicate that students are more thorough in their
preparation for essay questions that in their preparation for objective examinations
like multiple choice questions.

Review: Advantages, Limitations, and Common Misconceptions of Essay


Questions
Advantages
Essay Questions:
1. Provide an effective way of assessing complex learning outcomes that cannot be
assessed by other commonly used paper-and-pencil assessment procedures.
2. Allow students to demonstrate their reasoning.
3. Provide authentic experience. Constructed responses are closer to real life than
selected responses.
Limitations
Essay Questions:
1. Necessitate testing a limited sample of the subject matter, thereby reducing
content validity.
2. Have limitations in reliability.
3. Require more time for scoring student responses.
4. Provide practice in poor or unpolished writing.
Common Misconceptions
1. By their very nature essay questions assess higher-order thinking.
2. Essay questions are easy to construct.
3. The use of essay questions eliminates the problem of guessing.
4. Essay questions benefit all students by placing emphasis on the importance of
written communication skills.
5. Essay questions encourage students to prepare more thoroughly.

When to use Essay Questions?

 When it is more important that the students construct rather than select the answer.
 When a teacher has sufficient resources and/or help (time, teaching assistants) to
score the student responses to the essay question(s)
 When “the group to be tested is small.”
 When a teacher is “more confident of [his/her] ability as a critical and fair reader than
as an imaginative writer of good objective test items.”

Page 34
Training Module on “Effective Test Item Construction Techniques”

Concerning the ranking of their students based on test scores, teachers should know that
some research suggests that students are ranked about the same on essay questions and
multiple-choice questions when tests results are compared (Chase & Jacobs, 1992).

Intended Learning Outcomes Objective Objective or Essay


items essay
Students will:
1. Analyze the function of humor in x
Shakespeare's “Romeo and Juliet”.
2. Describe the attributes of a democracy. x
3. Distinguish between learning outcomes x
appropriately assessed using essay
questions and outcomes better assessed
by some other means.
4. Evaluate the impact of the Industrial x
Revolution on the family.
5. Know the definition for the Law of Demand. x
6. Predict the outcome of an experiment.
7. Propose a solution for the disposal of x
batteries that is friendly to users and the
environment.
8. Recall the major functions of the human x
heart.
9. Understand the “Golden Rule”. x
10. Use a theory in literature to analyze a x
poem.

The directive verb in each intended learning outcome provides clues about the method of
assessment that should be used. This can be seen when taking a closer look at some of the
sample intended learning outcomes provided on this page. For example, the verb “recall”
means to retrieve relevant knowledge from long-term memory. Students’ ability to recall
relevant knowledge can be most conveniently assessed through objectively scored test
items. There is no need for students to explain or justify their answer when they are
assessed on recall.

The verb “analyze” means to determine how parts relate to one another and to an overall
structure or purpose. Students can demonstrate their ability to analyze the function of humor
in Shakespeare’s “Romeo and Juliet” by either describing the function of humor in their own
words or by selecting the right or best answer among different options of a well drafted
multiple choice item.

The verb “evaluate” means to make judgments based on criteria and standards. To
effectively assess students’ ability to evaluate the impact of the Industrial Revolution on the
family, the assessment item needs to provide students with the opportunity to not only make
an evaluative judgment but to also explain how they have arrived at their judgment. Hence,
students’ ability to evaluate should be assessed with essay items because they allow
students to explain the rationale for their judgment.

Review: When to Use Essay Questions

It is appropriate to use Essay questions for the following purposes:

 To assess students' understanding of subject-matter content

Page 35
Training Module on “Effective Test Item Construction Techniques”

 To assess higher-order thinking skills that cannot be adequately assessed by


objectively scored test items.
 To assess students' ability to construct rather than to select answers

If an intended learning outcome could be either assessed through objective items or essay
questions, use essay questions for the following situations:

 When it is more important that the students construct rather than select the
answer
 When a teacher has sufficient resources and/or help (time, teaching assistants)
to score the student responses to the essay question(s)
 When the group to be tested is small.
 When a teacher is more confident of his/her ability as a critical and fair reader
than as an imaginative writer of good objective test items

Guidelines for Constructing Essay Questions

Students should have a clear idea of what they are expected to do after they have read the
problem presented in an essay item. Below are specific guidelines that can help to improve
existing essay questions and create new ones.

1. Clearly define the intended learning outcome to be assessed by the item.

Knowing the intended learning outcome is crucial for designing essay questions.
If the outcome to be assessed is not clear, it is likely that the question will assess for
some skill, ability, or trait other than the one intended.

In specifying the intended learning outcome teachers clarify the performance that
students should be able to demonstrate as a result of what they have learned. The
intended learning outcome typically begins with a directive verb and describes the
observable behavior, action or outcome that students should demonstrate. The focus is
on what students should be able to do and not on the learning or teaching process.

Reviewing a list of directive verbs can help to clarify what ability students should
demonstrate and to clearly define the intended learning outcome to be assessed.

2. Avoid using essay questions for intended learning outcomes that are better
assessed with other kinds of assessment.

Some types of learning outcomes can be more efficiently and more reliably assessed
with selected-response questions than with essay questions. In addition, some complex
learning outcomes can be more directly assessed with performance assessment than
with essay questions. Since essay questions sample a limited range of subject-matter
content, are more time consuming to score, and involve greater subjectivity in scoring,
the use of essay questions should be reserved for learning outcomes that cannot be
better assessed by some other means.

3. Define the task and shape the problem situation.

Essay questions have two variable elements—the degree to which the task is structured
versus unstructured and the degree to which the scope of the context is focused or
unfocused. Although it is true that essay questions should always provide students with
structure and focus for their responses, it is not necessarily true that more structure and
more focus are better than less structure and less focus. When using more structure in

Page 36
Training Module on “Effective Test Item Construction Techniques”

essay questions, teachers are trying to avoid at least two problems. More structure helps
to avoid the problem of student responses containing ideas that were not meant to be
assessed and the problem of extreme subjectivity when scoring responses. Although
more structure helps to avoid these problems, how much and what kind of structure and
focus to provide is dependent on the intended learning outcome that is to be assessed
by the essay question and the purpose for which the essay question is to be used.

The process of writing effective essay questions involves defining the task and delimiting
the scope of the task in an effort to create an effective question that is aligned with the
intended learning outcome to be assessed by it. This alignment is absolutely necessary
for eliciting student responses that can be accepted as evidence for determining the
students’ achievement of the intended learning outcome. Hence, the essay question
must be carefully and thoughtfully written in such a way that it elicits student responses
that provide the teacher with valid and reliable evidence about the students’ achievement
of the intended learning outcome.

Failure to establish adequate and effective limits for the student response to the essay
question allows students to set their own boundaries for their response, meaning that
students might provide responses that are outside of the intended task or that only
address a part of the intended task. If students’ failure to answer within the intended
limits of the essay question can be ascribed to poor or ineffective wording of the task, the
teacher is left with unreliable and invalid information about the students’ achievement of
the intended learning outcome and has no basis for grading the student responses.
Therefore, it is the responsibility of the teacher to write essay questions in such a way
that they provide students with clear boundaries for their response.

Task(s) and problem(s) are the key elements of essay questions. The task will specify the
performance students should exhibit when responding to the essay question. A task is
composed of a directive verb and the object of that verb. For example, consider the following
tasks:

i. Task = Justify (Directive verb) the view you prefer (object of the Verb)

ii. Task = Defend (Directive verb) the theory as the most suitable for the situation
(object of the verb)

Tasks for essay questions are not developed from scratch, but are developed based on
the intended learning outcome to be assessed. In essay questions, the task can be
presented either in the form of a direct question or an imperative statement. If written as
a question, then it must be readily translatable into the form of an imperative statement.
For example, the following illustrates the same essay item twice, once as a question and
once as an imperative statement.

Question: How are the processes of increasing production and improving quality in a
manufacturing plant similar or different based on cost?

Imperative statement: Compare and contrast the processes of increasing production


and improving quality in a manufacturing plant based on cost.

(compare and contrast processes based on cost). Whether essay questions are written
as imperative statements or questions, they should be written to align with the intended
outcome and in such a way that the task is clear to the students.

The other key element of essay questions is the “problem.” The “problem” in essay
questions includes the unsettled matter or undesirable state of affairs that needs to be
Page 37
Training Module on “Effective Test Item Construction Techniques”

resolved. The purpose of the problem is to provide the students with a context within
which they can demonstrate the performance to be assessed. Ideally, students would not
have previously encountered the specific problem.

Problems within essay questions differ in the complexity of thinking processes they elicit
from students depending on the intended learning outcome to be assessed. For
example, if the intended outcome is to assess basic recall, the essay question could be
to summarize views as given in class concerning a particular conflict. The thinking
process in this case is fairly simple. Students merely need to recall what was mentioned
and discussed in class. Yet consider the problem within the essay question meant to
assess students’ abilities to evaluate a particular conflict and to justify their reasoning.
This problem is more complex. In this case, students have to recall facts about the
conflict, understand the conflict, make judgments about the conflict and justify their
reasoning.

Depending on the intended learning outcome to be assessed, teachers may take


different approaches to develop the problem within an essay question. In some cases,
the intended outcome can be assessed well using a “problem” that is inherent in the task
of the essay question.

Example:

Intended Learning Outcome: Understand the interrelationship of grade histories,


student reviews and course schedules for students’ selection of a course and
professor.

Essay Question: Explain the interrelationship of grade histories, student reviews and
course schedules for students’ selection of a course and professor.

In the example essay question, the problem is inherent in the task of the question and is
sufficiently developed. The problem for students is to translate into their own words the
interrelationships of certain factors affecting how students select courses.

For intended learning outcomes meant to assess more complex thinking, often a “problem
situation” is developed. The problem situation consists of a problem that students have not
previously encountered that presents some unresolved matter or into an essay question is to
confront students with a new context requiring them to assess the situation and derive an
acceptable solution by using:

1. Their knowledge of the relevant subject matter, and


2. Their reasoning skills.

Intended learning outcome: Analyze the impact of War on terror on the Pakistani
economy.

Less effective essay question:


Describe the impact of War on terror on the Pakistani economy.

More effective essay question:


Analyze the impact of War on terror on the Pakistan economy by describing how
different effects of the war work together to influence the economy.

Page 38
Training Module on “Effective Test Item Construction Techniques”

Example of an Evolving Essay Question that Becomes More Focused


1. Less focused Evaluate the impact of the Industrial Revolution on England.
essay question:
2. More focused Evaluate the impact of the Industrial Revolution on the family in
essay question: England.

4. Helpful Instructions: Specify the relative point value and the approximate time
limit in clear directions.

Specifying the relative point value and the approximate time limit helps students allocate
their time in answering several essay questions because the directions clarify the relative
merit of each essay question. Without such guidelines students may feel at a loss as to
how much time to spend on a question. When deciding the guidelines for how much time
should be spent on a question keep the slower students and students with certain
disabilities in mind. Also make sure that students can be realistically expected to provide
an adequate answer in the given and/or the suggested time.

5. Helpful Guidance: State the criteria for grading


Students should know what criteria will be applied to grade their responses. As long as
the criteria are the same for the grading of the different essay questions they don’t have
to be repeated for each essay question but can rather be stated once for all essay
questions. Consider the following example.

Example

All of your responses to essay questions will be graded based on the following criteria:

The content of your answer will be evaluated in terms of the accuracy,


completeness, and relevance of the ideas expressed. The form of your answer
will be evaluated in terms of clarity, organization, correct mechanics (spelling.
punctuation, grammar, capitalization), and legibility.

Essay question, they should specify the relative point value for the content and the relative
point value for the form.

Rubric for grading long essay exam questions (10 points possible)

Response Score Criteria

Exemplary 10  The answer is complete.


 All information provided is accurate.
 The answer demonstrates a deep understanding of the content.
 Writing is well organized, cohesive, and easy to read.
Competent 9  The answer is missing slight details.
 All information provided is accurate.
 The answer demonstrates understanding of the content.
 Writing is well organized, cohesive, and easy to read.
Minor Flaws 8  The answer is missing multiple details.
 All information provided is accurate.
 The answer demonstrates basic understanding of the content.
 Writing is organized, cohesive, and easy to read.
Satisfactory 7  The answer does not address a portion of the question, or major details are
missing.
 Almost all information provided is accurate.
 The answer demonstrates basic understanding of the content.
 Writing is organized, cohesive, and easy to read

Page 39
Training Module on “Effective Test Item Construction Techniques”

Nearly 6  The answer is lacking major details and/or does not address a portion of the
Satisfactory question.
 Most information provided is accurate.
 The answer demonstrates less than basic understanding of the content.
 Writing may be unorganized, not cohesive, and difficult to read.
Fails to 4  The answer to the question is lacking any detail.
Complete  Some information provided is accurate.
 The answer demonstrates a lack of understanding of the content.
 Writing may be unorganized, not cohesive, and difficult to read.
Unable to 2  Question is not answered.
begin  A small amount to none of the information provided is accurate.
effectively  The answer demonstrates a lack of understanding of the content.
 Writing is unorganized, not cohesive, and very difficult to read.
No attempt 0  Answer was left blank.

6. Use several relatively short essay questions rather than one long one.

Only a very limited number of essay questions can be included on a test because of the
time it takes for students to respond to them and the time it takes for teachers to grade
the student responses. This creates a challenge with regards to designing valid essay
questions. Shorter essay questions are better suited to assess the depth of student
learning within a subject whereas longer essay questions are better suited to assess the
breadth of student learning within a subject. Hence, there is a trade-off when choosing
between several short essay questions or one long one. Focus on assessing the depth
of student learning within a subject limits the assessment of the breadth of student
learning within the same subject and focus on assessing the breadth of student learning
within a subject limits the assessment of the depth of student learning within the same
subject.

When choosing between using several short essay questions or one long one also keep
in mind that short essays are generally easier to score than long essay questions.

7. Avoid the use of optional questions

Students should not be permitted to choose one essay question to answer from two or
more optional questions. The use of optional questions should be avoided for the
following reasons:
 Students may waste time deciding on an option.
 Some questions are likely to be harder which could make the comparative
assessment of students' abilities unfair.
 The use of optional questions makes it difficult to evaluate if all students are
equally knowledgeable about topics covered in the test.

8. Improve the essay question through preview and review.

The following steps can help you improve the essay item before and after you hand it
out to your students.

Preview (before handing out the essay question to the students)


a. Predict student responses.

 Try to respond to the question from the perspective of a typical student.

Page 40
Training Module on “Effective Test Item Construction Techniques”

 Evaluate whether students have the content knowledge and the skills
necessary to adequately respond to the question. In detecting possible
weaknesses of the essay question, repair them before handing out the exam.

b. Write a model answer.

 Before using a question, write model answer(s) or at least an outline of major


points that should be included in an answer. Writing the model answer allows
reflection on the clarity of the essay question. Furthermore, the model
answer(s) serve as a basis for the grading of student responses.
 Once the model answer has been written compare its alignment with the
essay question and the intended learning outcome and make changes as
needed to assure that the intended learning outcome, the essay question,
and the model answer are aligned with each other.

c. Ask a knowledgeable colleague to critically review the essay question, the


model answer, and the intended learning outcome for alignment.

 Before using the essay question on a test, ask a person knowledgeable in the
subject (colleague, teaching assistant, etc.) to critically review the essay
question, the model answer, and the intended learning outcome to determine
how well they are aligned with each other. Based on the intended learning
outcome, revise the question as needed. By having someone else look at the
test, the likelihood of creating effective test items is increased, simply
because two minds are usually better than one. Try asking a colleague to
evaluate the essay questions based on the guidelines for constructing essay
questions.

Review (after receiving the student responses)

d. Review student responses to the essay question

After students complete the essay questions, carefully review the range of
answers given and the manner in which students seem to have interpreted the
question. Make revisions based on the findings. Writing good essay questions is
a process that requires time and practice. Carefully studying the student
responses can help to evaluate students' understanding of the question as well
as the effectiveness of the question in assessing the intended learning outcomes.

Review: How to Construct Essay Questions

1. Clearly define the intended learning outcome to be assessed by the item.


2. Avoid using essay questions for intended learning outcomes that are better assessed
with other kinds of assessment.

3. Define the task and shape the problem situation.


a. Clearly define the task.
b. Clearly develop the problem or problem situation.
c. Delimit the scope of the task.

4. Helpful instructions: specify the relative point value and the approximate time limit in
clear directions.

5. Helpful guidance: state the criteria for grading

Page 41
Training Module on “Effective Test Item Construction Techniques”

6. Use several relatively short essay questions rather than one long one

7. Avoid the use of optional questions.

8. Improve the essay question through preview and review.

Preview (before)
a. Predict student responses.
b. Write a model answer.
c. Ask a knowledgeable colleague to critically review the essay question, the
model answer, and the intended learning outcome for alignment.
Review (after)
a. Review student responses to the essay question.

Review Exercise: How to construct Essay Questions

For exercises 1 – 2 develop an effective essay question for the given intended learning
outcome. Make sure that the essay question meets the following criteria:
 The essay question matches the intended learning outcome
 The task is specifically and clearly defined.
 The relative point value and the approximate time limit are specified

Exercise

Choose an intended learning outcome from a course you are currently teaching and create
an effective essay question to assess students’ achievement of the outcome. Follow each of
the guidelines provided for this exercise. Check off each step on the provided checklist once
you have finished it.

Checklist
1 Clearly define the intended learning outcome to be assessed by the item.
2 Avoid using essay questions for objectives that are better assessed with
3 Objectively-scored items.
4 Use several relatively short essay questions rather than one long one.
5 The task is appropriately defined and the scope of the task is appropriately limited
6 Present a novel situation.
7 Consider identifying an audience for the response
8 Specify the relative point value and the approximate time limit.
9 Predict student responses.
10 Write a model answer.
11 Have colleague critically review the essay question

Page 42
Training Module on “Effective Test Item Construction Techniques”

EVALUATION OF ITEMS

Day 4, Session- 1

Often students judge, after taking the exam, whether the test was fair and good. Teacher is
also usually interested about how the test worked for the students. One way to ascertain this
is to undertake item analysis. It provides objective, external and empirical evidence for the
quality of the items we have pre-tested.

The objective of item analysis is to identify problematic or poor items which might be either
confusing the respondents or do not have a clearly correct response or a distracter might
well be competing with the keyed answer.

A good test has good items. Good test making requires careful attention to the principles of
item evaluation. The basic methods involve are assessment of item difficulty and item
discrimination. These measures comprise item analysis.

Item Analysis
Item analysis is about how difficult an item is and how well it can discriminate between the
good and the poor students. In other words, item analysis provides a numerical assessment
of item difficulty and item discrimination.

Item Difficulty
Item difficulty is determined from the proportion (p) of students who answered each item
correctly. Item difficulty can range from zero (none could solve it)to hundred (all persons
solved it correctly). The goal is usually to have items of all difficulty levels in the test so that
test could identify poor, average as well as good students. However, most of the items are
designed to be average in difficulty levels for they are more useful. Item analysis exercise
provides us the difficulty level of each item.

 Optimally difficult items are those that 50%–75% of students answer correctly.
 Items are considered low to moderately difficult if (p) is between 70% and 85%
 Items that only 30% or below solve correctly are considered difficult ones.

Item Difficulty Percentage can also be denoted as Item Difficulty Index by expressing it in
decimals e.g. .40 for items which could be solved by 40 % of the test-takers. Thus index can
range from 0 to 1.

Items should fall in a variety of difficulty levels in order to differentiate between good and
average as well as average and poor students. Easy items are usually placed in the initial
part of the test to motivate students in taking the test and alleviating test-anxiety.

The optimal item difficulty depends on the question type and number of possible distracters
as well.

Item Discrimination
Another way to evaluate items is to ask “Who gets this item correct”-- the good, average and
the weak students? Assessment of item discrimination answers this query.

Item discrimination refers to the percentage difference in correct responses between the
poor and the high scoring students.

In a small class of 30 students, one can administer the test items, score them and then rank
the students in terms of their overall score.

Page 43
Training Module on “Effective Test Item Construction Techniques”

 Next, we separate the upper 15 students and the low 15 into two groups: The
UPPER and the LOW groups

 Finally, we find how well each item was solved correctly (p) by each group. In other
words, percentage of students passing (p) each item in each of the two groups is
worked out.

 Discrimination (D) power of the item is then known by finding difference between the
percentage of upper group and the low group. The higher the difference, the greater
the discrimination power of an item.

D = (p of upper group - p of lower group) Also see the following tables.

In a large class of 100 or more students, we take the top 25% and the lower 25% students to
form upper and lower groups, to cut short the labor or amount of work.

The discrimination ratio for an item falls between −1.0 and +1.0. The closer the ratio is to
+1.0, the more effectively that item distinguishes students who know the material (the top
group) from those who don’t (the bottom group).

 An item with a discrimination of 60% or greater is considered a very good item,


whereas a discrimination of less than 20% indicates a low discrimination and the item
needs to be revised.
 An item with a negative index of discrimination indicates that the poor students
answer correctly more often than do the good students. Strange! Such items should
be dropped from the test.

Ten students in a class have taken a ten item quiz. The students’ responses are shown
below from high to low. The top five students can be called the high score group and the
bottom half as the low scoring group. The number”1” indicates a correct answer; a “0”
indicates an incorrect answer.

Students Total Q.1 Q.2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10


score%
1 100 1 1 1 1 1 1 1 1 1 1
2 90 1 1 1 1 1 1 1 1 0 1
3 80 1 1 0 1 1 1 1 1 0 0
4 70 0 1 1 1 1 1 0 1 0 1
5 70 1 1 1 0 1 1 1 0 0 1
6 60 1 1 1 0 1 1 0 1 0 0
7 60 0 1 1 0 1 1 0 1 0 1
8 50 0 1 1 1 0 0 1 0 1 0
9 40 1 1 1 0 0 0 0 0 1 1
10 30 0 1 0 0 0 1 0 0 1 0

Page 44
Training Module on “Effective Test Item Construction Techniques”

Difficulty Index and Discrimination Index are calculated below

Item Correct high Correct low Difficulty % Discrimination%


group group
Question 1 4 2 60 40
Question 2 5 5 100 0
Question 3 4 4 80 0
Question 4 4 1 50 60
Question 5 5 2 80 60
Question 6 5 3 80 40
Question 7 4 1 50 60
Question 8 4 2 06 40
Question 9 1 3 30 - 40
Question 10 4 2 60 40

 Question no 2 was the easiest; no 9 was most difficult.


 Question 9 also had negative discrimination and should be removed from the
quiz.
 100% discrimination would occur if all those in the upper group answered
correctly and all those in the lower group answered incorrectly.
 Zero discrimination occurs when equal numbers in both groups answer correctly.
 Negative discrimination, a highly undesirable condition, occurs when more
students in the lower group than the upper group answer correctly.
 Items with 25% and above discrimination are considered good.

Discrimination by Item Difficulty Graph


Both difficulty and discrimination indices are important parameters which influence each
other. A two-way chart indicating relationship between the two indices is shown below.

Students’ responses on the 10 item quiz as shown above can be presented on a chart.

Difficulty by Discrimination Chart indicating overall efficacy of the quiz


Disc %
60 Item 4, 7
50 Item 5
40 Item 1, 8,
10
30 Item 6
20
10
0 Item 3 Item 2
-10
-20
-30
-40 Item 9
10 20 30 40 50 60 70 80 90 100

D I F F I C U L T Y %
We find from the chart that items of medium difficulty level are more discriminating and
useful unlike too difficult (item 9) and too easy items (no 2, 3).

Interpreting Distracter Values


 Distracters should be ideally equally attractive, but not more than the answer.
 Minimum, it must be opted by at least 5% of the examinees.

Page 45
Training Module on “Effective Test Item Construction Techniques”

 Weak or non-functional distracters may be substituted with new ones and make sure
that they align with the stem as well as the objective of the item, well connected with
the rest, and are grammatically correct.

Effectiveness of Distracters
Difficulty and discrimination index are estimates about an item which overall comprises a
stem and a set of distracters or options (Appendix- A). The item analysis statistics reflects on
the goodness of both distracters and the stem. Let us look at the guidelines which can help
us improve them.

1. Most MCQs have 2-4 distracters; 3 is better, 4 is best at the college level Where it is
difficult to think of more than one distracter, frame it as true/false item
2. Distracters that have less than 5 percent response rate are weak and may be
changed / improved. Distracters which attracted no response are not working at all.
3. No distracter should be chosen more than the keyed response in the upper group
4. Similarly, no one distracter should pull more than about half the students.
5. If students have respond about equally to all the options, they might be marking
randomly or wildly guessing. Critically check contents of such items. They might have
been written badly and the students seem to have no idea what you are asking. It
could be very difficult items and students might be completely baffled.
6. If the low group gets the keyed answer as often as the upper group, all the distracters
might be looked into again. Or drop the item if you have a large pool of items.

Other theoretical points to consider:

7. Do not repeat a phrase in the options if it can be stated in the stem. Thus make short
and precise distracters.
8. Us Options appear on separate lines and are suitably indented.
9. There should be plausible and homogeneous distracters which are presented in
logical or numerical order and must be independent of one another.
10. Keep the alternatives mutually exclusive, homogeneous in contents and free from
clues that might indicate which response is correct. These should be moreover
parallel in form and similar in length.
11. The position of the keyed response should vary from the A, B, C and D positions
12. Distracters should be related or somehow linked to each other, should appear as
similar as possible to the correct answer and should not stand out as a result of their
phrasing . If the stem is in past tense, all the options should be in past tense. If the
tense calls for a plural answer, all the options should be plural. Stem and options
should have subject-verb agreement. All options follow grammatically from the stem.
13. Options should not include “none of the above” or “all of the above.” None of the
above is problematic in items where judgment is involved and where the options are
not absolutely true or false.
14. When more than one option has some element of accuracy but the keyed Response
is the best, ask the students to select the “best answer” rather than “correct
answer.”

Session 2: Item Analysis and Subsequent Decision

Activity: Identifying poor items and ways to improve them

Objectives:
1- To consolidate the preceding presentation
2- Applying principles & conventions of item construction / Brainstorming
3- Hands-on-Practice / Learning by doing

Page 46
Training Module on “Effective Test Item Construction Techniques”

Method: Interactive Class Discussion

Material: A set of 30 flawed MCQs (Appendix-A)

Outcome: Learning how to detect and remove flaws.

Decisions Subsequent to Item Analysis

1. Item revision to remove flaws or write alternative items


2. Does the reviewed pool of item correspond with the original table of specifications
and stipulated objectives? Discrepancies, if any, have to be removed before using
the test.
3. While assembling a test (out of the pre-tested pool of items) set the items into groups
(parts of the test) with appropriate instructions.
4. Check the scoring key of the revised test.
5. Deciding about the duration / time of the test for actual use on the basis of :
a) rate of omitted responses in the pre-test
b) observation of the test administrator
6. May review scoring / grading scheme e.g. choose or drop negative marking.
7. Be informed about instructional weaknesses and student misconception to prepare
them better in future. May even coach students about MCQs solving strategies.

Session 3: Desirable Qualities of a Test

All tests are desired to be valid and reliable, but no tests are more so than MCQs because of
several advantages over other examination techniques. Here we mention of four qualities of
a test.

1- Reliability

Reliability can be defined as a procedure that tells us whether a test is likely to yield the
same results if administered multiple times to the same group of test takers. In other words a
test is said to be reliable if it measures consistently.

If there is consistency or homogeneity among questions, it enhances reliability of the test. A


test may be having several parts: mathematics, verbal comprehension etc. therefore
separate reliability of each part will be worked out.

Since MCQs have clear-cut, unambiguously correct or incorrect objectively score-able


answers, there is more marker reliability in such assessment.

However, no test or measure is perfect. A certain degree of error does creep in called
random or chance error. In measuring length with a ruler, for example, there may be random
error associated with your eye's ability to read the marking or extrapolate between the
markings. In addition, the scale that you use to measure length not be very precise and
accurate

These factors fluctuate and vary from time to time influencing students’ performance on the
test adversely. When such chance or random error is kept to the minimum, test scores truly
reflect on students’ ability. Reliability, as a statistical estimate, ranges between 0-1.

There are three sources of error which adversely influence reliability.

Page 47
Training Module on “Effective Test Item Construction Techniques”

1) Factors in the test itself


 Most tests contain a collection of items that represent different skills therefore test
content are usually not homogeneous.
 Length of test also matters: A test of 50 items would be more reliable than that of
30. Psychometric theory suggests that adding more items should increase the
reliability of the items, provided distracters are good ones.
 Other sources of test error include the effectiveness of the distracters and
difficulty of the items: Too or too easy item limit the reliability of the test.

 Variations in testing procedure such as making changes in the test instructions or


time limit can also undermine reliability. Data of such cases should not be pooled.

2) Factors in test-takers
Changes in student's attitudes, health, mood, sleep etc can affect the quality of their
efforts and thus their test-taking consistency. For example, test takers may make
careless errors, misinterpret test instructions, forget test instructions, inadvertently omit
test sections, or misread test items.

3) Scoring errors
These refer to scoring rubrics, and a host of rater errors manifested in exams.

How to Estimate Reliability?

Educational tests are reliable when they have homogeneous contents. Performance of the
students is consistent on homogeneous test contents: The poor students would perform
poorly throughout the test and vice versa. Among several methods: test-retest, split-half,
alternative form and internal consistency, the last one is particularly salient to scholastic
tests. ‘KR-20’ as it is formally called is a statistical method that is related to item analysis
work as well.
________________________________________________________________________
KR-20 formula statistically works out the reliability of an educational or ability test, once the
difficulty level of the items is known, that is what proportion of students passed (p) and what
proportion failed (q) an item unit of a test.

KR-20 formula = (n /n-1) SDt² - ∑ pq / SDt²


Where;
n = number of items,
p = pass percentage of an item in the total group
q = fail percentage of the item in the total group
SDt²= variance of the total test

Page 48
Training Module on “Effective Test Item Construction Techniques”

________________________________________________________________________
Let us apply this formula to 10 item quiz that we item-analyzed in the earlier session.

Mean = 6.5 SD = 2.17


__________________________
Items p q pq
1 .6 .4 .24
2 1 0 0
3 .8 .2 .16
4 .5 .5 .25
5 .8 .2 .16
6 .8 .2 .16
7 .5 .5 .25
8 .08 .92 .07
9 .3 .6 .18
10 .6 .4 .24
___
∑ p q =.1.71
__________

KR-20 = ( k / k-1) SDt² - ∑ p q / SDt²

Next we put values in the formula = ( 10 / 10-1) 2.17 - 1.71 / 2.17


= (1.1) .46 / 2.17
= (1.1) 2.11
= .235 Answer

K = number of items
SDt2 = Standard deviation of the total class squared
p = proportion of students who pass an item
q = proportion of students who fail in solving the item
________________________________________________________________________

Values of .8 or higher are considered satisfactory for a test of 50 or more items. For a short
quiz of 10 items which has some flawed items as well as pointed out through the p and q
values, the reliability index would be very modest. The quiz when revised would gain in
reliability and if more items are added to it, provided they are good ones, it will improve in
reliability still more. Degree of reliability is a function of the number of items in a quiz.
Lengthened tests comprehensively cover the contents / subject matter. The more the
merrier!

Validity
Validity indicates whether a test measures what it is purported to measure. Usually test
based on MCQs cover the entire course and is therefore potentially a more valid assessment
than the descriptive tests.

The scores of a not-so-valid a test are less credible in warranting student’s mastery of the
course material. It is therefore not safe to draw any inference or decision from it. Assessing
content validity is very salient and essential to an educational test.

Course exams or scholastic tests are required to cover and represent the entire course
domain / knowledge to be called valid tests. Subject specialists or experts judge how valid a
test is by contents.

Page 49
Training Module on “Effective Test Item Construction Techniques”

Content reliability essentially involves systematic examination of test contents to determine


whether it covers a representative sample of the knowledge domain / course along with the
learning /course objectives in the right proportion (as specified in the table of specification.
(See Appendix-B)

For example, a test intended to measure knowledge of science in fifth grade course in
science is judged by a panel of 2-3 teachers who estimate as experts how representative the
test contents are of grade-5 science and the objectives of learning course with due
emphasis. They rate the test material accordingly and degree of agreement in their ratings is
considered content validity index.

Another method to establish validity is to correlate test scores on this test with another
established test or criterion: For example correlating scores in Economics Test in a local
university with overall GPA or with GRE score etc. This procedure is called criterion related
validity.

Practicality
MCQs are practically useful and efficient especially in a large scale testing situation unlike
descriptive tests which are more resource intensive and demand time and money.
Increasing enrolment in college and universities has made the staff to incorporate objective
testing within their assessment for more efficient examining of students.

MCQs are versatile and adaptable to various levels of learning / educational objectives
(recall, comprehension, application, analysis).

MCQs test a broad area of knowledge in a short time and are moreover easy to score.
Moreover they yield rich data for psychometric analysis.

To develop quality MCQs faculty need have sufficient know-how of the techniques of test
construction besides having motivation to persevere in this intensive work of framing and
pre-testing question papers. It also requires them to orient students to MCQs as a system of
assessment and evaluation.

Objectivity
It refers to fairness and uniformity in the test scoring procedure. Examiner / rater bias is
therefore non-existent in MCQ based tests. That is why they are called objective tests.

Further, the analysis of the test data is undertaken statistically which further assesses
various dimensions of the tests to make them more precise and accurate instruments.

Page 50
Training Module on “Effective Test Item Construction Techniques”

Professional Ethics in Educational Assessment


Resource: Dr. Pat Nellor Wickwire, Education Resource Information Center (ERIC)

ACCOUNTABILITY to all Stake Holders Internal & External Clients

 Professional Norms / Ethical Standards


 Professional Applications / Code of Conduct
1. Teacher as scientist and practitioner
2. Professional Practice Ethics
 Engage Empowerment (E)
 Temper Tone (T)
 Honor Humility (H)
 Internalize Integrity (I)
 Communicate commitment ( C)
 Synthesize Standards (S)

CONTRIBUTE TO THESE HIGH PRINCIPLES


 Welfare and development of the client
 Equal access to all / Clients
 Maintaining loyalty to all / clients
 Final decision is that of the individual

DESIGN & IMPLIMENT EDUCATION ASSESSMENT AS AVALUE-ADDED COMPONENT


TO LEARNING
 Select Assessment Method (searching available and construct able material)
 Representativeness of domain sampling
 Meaningful score reporting
 Balance between highest quality and greatest benefits

PREPARATION
 Printing, question papers (uniform-varied)
 Exam Hall size &number of examinees
 Training in test administration, monitoring students
 Security of material (with precise accounting)

MARKING / DTADING & REPORTING


 Timeliness, accuracy and clarity
 Result card / Transcript

INTERPRETATION OF ASSESSMENT RESULTS


 Curved or absolute scores
 How many slabs /grades, grades cut-off score
 Local examining rules and regulations
 Scoring, record keeping and access to class results

COMMUNICATE
 Result to be conveyed in clear and understandable terms

APPLICATION (of results /assessment)


 For welfare, development and growth of students in terms of

Page 51
Training Module on “Effective Test Item Construction Techniques”

 Selection of future subjects /courses/ careers


 Diagnostic counseling & remedial education
 Institutional education planning / students for future programs
 Changes in future instructional strategies / interventions

EVALUATION
 Of formative and summative assessment
 Judging learning outcomes of students
 Improvements in A-E above

References

Bloom B S. Taxonomy of educational objectives. Vol I: Cognitive domain. New York, NY: M

Cox K R, Bandaranayake R. (1978). How to write good multiple choice questions. Med
Journal, Medline, 2 , 553–554.

Kaplan & Saccuzzo (2002) Psychological Testing: Principles, Applications & Issues (7th
Edition)

Pat Nellor Wickwire (2004) Application of professional ethics in educational assessment.


Education al Resources Information Center (ERIC) Chapter 25, pp 349-362.(Appendix-D

http://testing.byu.edu/info/handbook/betteritems.pdf

Page 52
Training Module on “Effective Test Item Construction Techniques”

List of Appendices

Appendix-A: Activity material for flawed items (Bibliography of Multiple-Choice


Question Resources)
Appendix-B: Table of Specifications / Test Blueprint
Appendix-C: Cognitive Domain, Instructional Objectives and item example
Appendix-D: Professional Ethics in Assessment (Chapter Reading)
Appendix-E: PowerPoint Presentations

Page 1
Training Module on “Effective Test Item Construction Techniques”

Appendix-A Activity material for flawed items

Bibliography of Multiple-Choice Question Resources

Books:
 Bloom, Benjamin B. (Ed.) Taxonomy of Educational Objectives: the classification of
educational goals, by a committee of college and university examiners 1st Ed. New
York: Longmans, Green, 1956.
 Davis, Barbara Gross. Tools for Teaching San Francisco: Jossey-Bass, 1993.
 Erickson, Bette LaSere and Diane Weltner Strommer. Teaching College Freshmen San
Francisco: Jossey-Bass, 1991.
 Jacobs, Lucy Cheser and Clinton I. Chase. Developing and Using Tests Effectively: A
Guide for Faculty San Francisco: Jossey-Bass, 1992.
 McKeachie, Wilbert. Teaching Tips: Strategies, Research, and Theory for College and
University Teachers (9th Ed.) Lexington, Mass: D.C. Heath and Company, 1994.
 Miller, Harry G., Reed G. Williams, and Thomas M Haldyna. Beyond Facts: Objective Ways
to Measure Thinking Englewood Cliffs: Educational Technology Publications, 1978.

Articles:
 Clegg, Victoria L. and William E. Cashin. "Improving Multiple-Choice Tests." Idea Paper #16,
Center for Faculty Evaluation and Development, Kansas State University, 1986.
 Fuhrman, Miriam. "Developing Good Multiple-Choice Tests and Test Questions." Journal of
Geoscience Education 44 (1996): 379-384.
 Johnson, Janice K. ". . . Or None of the Above." The Science Teacher 56.2 (1989) 57-61.

Websites:
 University of Capetown's Guide to Designing and Managing Multiple Choice Questions

Contact:
Email: tep@uoregon.edu, Phone: 541-346-2177 Fax: 541-346-2184
University of Oregon.

Page 2
Training Module on “Effective Test Item Construction Techniques”

Appendix-B

Table of Specifications / Test Blueprint

Once you know the learning objectives and item types you want to include in your test you
should create a test blueprint. A test blueprint, also known as test specifications, consists of
a matrix representing the number of questions you want in your test within each topic and
level of objectives. The blueprint identifies the objectives and skills that are to be tested and
their relative weight age. The blueprint can help you steer through the desired coverage of
topics as well as levels of objectives. Once you create your test blueprint you can begin
writing your items!

Topic A Topic B Topic C Topic D TOTAL


Knowledge 1 2 1 1 5 (12.5%)
Comprehension 2 1 2 2 7 (17.5%)
Application 4 4 3 4 15 (37.5%)
Analysis 3 2 3 2 10 (25%)
Synthesis 1 1 2 (5%)
Evaluation 1 1 (2.5%)
TOTAL 10 (25%) 10 (25%) 10 (25%) 10 (25%) 40

 This sketch indicates a plan for 40 items.


 The 40 Items equally cover four topics being examined through this test. That is 10
items (25% of the test) for each topic.
 The items will, moreover, test 6 objectives (of the test): Knowledge, comprehension,
application, analysis, synthesis & evaluation. It is further shown in this table that the
test constructer wants to write 5 items about testing knowledge about various topic,
a little more(7) about comprehension and still more (17) about application and so on.

Page 3
Training Module on “Effective Test Item Construction Techniques”

Appendix-C

Cognitive Domain, Instructional Objectives and item example


1-Knowledge

Outcome: Identifies the meaning of a term.


Reliability is the same as:
A. consistency* B. relevancy C. representativeness D. usefulness

Outcome: Identifies the order of events.


What is the first step in constructing an achievement test?
A. Decide on test length. B. Identify the intended learning outcomes*
C. Prepare a table of specifications. D. Select the item types to use.

2-Comprehension

Outcome: Identifies an example of a concept or principle.


Which of the following is an example of a criterion-referenced interpretation?
A. Derik earned the highest score in science
B. Erik completed his experiment faster than his classmates
C. Edna’s test score was higher than 50 percent of the class
D. Tricia set up her laboratory equipment in five minutes*

3-Application

Outcome: Distinguishes between properly and improperly stated outcomes.

Which one of the following learning outcomes is properly stated in terms of student
performance?
A. Develops an appreciation of the importance of testing.
B. Explains the purpose of test specifications*
C. Learns how to write good test items.
D. Realizes the importance of validity.

4-Analysis

Directions: Read the following comments a teacher made about testing. Then answer the
questions that follow by circling the letter of the best answer.

“Students go to school to learn, not to take tests. In addition, tests cannot be used to indicate
a student’s absolute level of learning. All tests can do is rank students in order of
achievement, and this relative ranking is influenced by guessing, bluffing, and the subjective
opinions of the teacher doing the scoring. The teacher-learning process would benefit if we
did away with tests and depended on student self-evaluation.”

Outcome: Recognizes unstated assumptions.

Which one of the following unstated assumptions is this teacher making?

A. Students go to school to learn.


B. Teachers use essay tests primarily.
C. Tests make no contribution to learning*

Page 4
Training Module on “Effective Test Item Construction Techniques”

D. Tests do not indicate a student’s absolute level of learning.

Outcome: Identifies the meaning of a term.

Which one of the following types of test is this teacher primarily talking about?

A. Diagnostic test B. Formative test C. Pretest D. Summative test*

5-Synthesis
Given a short story, the student will write a different but plausible ending.
(See paragraph for analysis items)
Outcome: Identifies relationships.

Which one of the following propositions is most essential to the final conclusion?

A. Effective self-evaluation does not require the use of tests*


B. Tests place students in rank order only.
C. Test scores are influenced by factors other than achievement.
D. Students do not go to school to take tests.

6-Evaluation
Given a description of a country’s economic system, the student will defend it by basing
arguments on principles of socialism.

Reference:
1. Kubiszyn, K., & Borich, G. (1984). Educational testing and measurement:
Classroom application and practice. Glenview, IL: Scott, Foresman, pp. 53-55.

2. Gronlund, N. E. (1998). Assessment of Student Achievement. Boston: Allyn and


Bacon.
Last revised

Page 5

You might also like