Professional Documents
Culture Documents
Topic: How To Assess? Ă Essay Tests
Topic: How To Assess? Ă Essay Tests
Topic: How To Assess? Ă Essay Tests
ă Essay Tests
5
LEARNING OUTCOMES
INTRODUCTION
In Topic 4, we discussed in detail the use of objective tests in assessing students.
In this topic, we will examine a different type of test called the essay test. The essay
test is a popular technique for assessing learning and is used extensively at all
levels of education.
(a) The learner has to compose rather than select his or her response or answer.
In essay questions, students have to construct their own answer and decide
on what material to include in their response. Objective test questions (MCQ,
true-false, matching) on the other hand, require students to select the answer
from a list of possibilities.
(b) The response or answer the learner provides will consist of one or more
sentences. Students do not respond with a „yes‰ or „no‰ but instead have to
respond in the form of sentences. In theory, there is no limit to the length of
the answer. However, in most cases, its length is predetermined by the
demand of the question and the time limit allotted for the test question.
(c) There is no one single correct response or answer. In other words, the
question should be composed so that it does not ask for one single correct
response. For example, the question „Who killed JWW Birch?‰ assesses
verbatim recall or memory and not the ability to think. Hence, it cannot
qualify as an essay question. You can modify the question „Who killed JWW
Birch? Explain the factors that led to the killing.‰ Now, this is an essay
question that assesses studentsÊ ability to think and give reasons for the
killing supported with relevant evidence.
whether the student has listed the three reasons correctly as long as the list
of three reasons is available as an answer key. For the question „To what
extent is commerce the main reason for the opening of Penang by the British
in 1789?‰, a subject matter expert is needed to grade or mark the answer to
this essay test question.
(ii) List five guidelines for writing good essay items. For each guideline,
write a short statement explaining why it is useful in improving the
validity of essay assessment.
As shown in the examples, students are specifically informed what and how
they should respond to the questions. They indicate the number of points
required and/or the scope of the responses. The restriction or limitation on
the studentsÊ responses can also be done by including an interpretative
material (e.g. a graph, a paragraph describing a particular problem or an
extract from a literary work) and students are asked to respond to one or two
questions based on it.
The restricted response questions are more structured and are useful for
measuring learning outcomes requiring the interpretation and application of
knowledge in a specific area. They narrow the focus of the assessment task
to a specific and well-defined performance. The nature of these questions
makes it more likely that the students will interpret each question the way it
is intended. The teacher is also in a better position to assess the correctness
of studentsÊ answers when a question is focused and all students interpret it
in the same way. When the teacher is clear about what makes up correct
In responding to extended response essay questions, students are free to select any
information that they think pertinent, to organise the answer in accordance with
their best judgement, to integrate and to evaluate ideas they deem appropriate.
This freedom enables them to demonstrate their ability to analyse problems,
organise their ideas, describe in their own words, and/or develop a coherent
argument. The extended-response essay questions are therefore useful in assessing
higher-order thinking skills. They can also be used to assess writing skills.
The freedom for students to respond to extended response essay questions can
cause some problems. First, there is usually no single correct answer to the
question. Students are free to choose the way to respond, and the degree of
correctness or merit of their answers can only be judged by a skilled subject-matter
expert. A large number of examiners is required if the assessment involves a big
student population. Inter-rater reliability in scoring can be an issue. Second, the
same freedom that enables the demonstration of creative expression and other
higher-order thinking skills makes the extended response essay question
inefficient for measuring more specific learning outcomes. Third, the extended
response essay questions require good writing skills on the part of the students.
This type of question is thus disadvantageous to students whose writing skills are
poor. Due to these limitations, it is often recommended that more restricted
response essay questions to be used in place of extended response essay questions.
ACTIVITY 5.1
Select a few essay questions that have been used in tests or examinations.
To what extent do these questions meet the criteria of an essay question
as defined by Stalnaker (1951) and elaborated by Reiner et al. (2002)?
(b) Thinking skills that require more than simple verbatim recall of information
by challenging the students to reason with their knowledge.
To determine what type of test (essay or objective) to use, it is helpful that you
examine the verb(s) that best describe the desired ability to be assessed (refer to
Topic 2).
These verbs indicate what students are expected to do and how they should
respond. They serve to focus on the studentsÊ responses and channel them towards
the performance of specific tasks. Some verbs clearly indicate that students need
to construct rather than select their answer (such as to explain). Other verbs
indicate that the intended learning outcome is focused on studentsÊ ability to recall
information (such as to list). Perhaps, recall is best assessed through objectively
scored items. Verbs that test for understanding of subject matter or content or other
forms of higher-order thinking, but do not specify whether the student is to
construct or select the response (such as to interpret) can be assessed either by
essay questions or objective items.
ACTIVITY 5.2
(b) Essay questions have limitations in reliability. While essay questions allow
students some flexibility in formulating their responses, the reliability of
marking or grading is questionable. Different markers or graders may vary
in their marking or grading of the same or similar responses (inter-scorer
reliability) and one marker can vary significantly in his or her marking or
grading consistency across questions depending on many factors (intra-
scorer reliability). Therefore, essay answers of similar quality may receive
notably different scores. Characteristics of the learner, length and legibility
of responses, and personal preferences of the marker or grader with regard
to the content and structure of the response are some of the factors that may
lead to unreliable marking or grading.
(c) Essay questions require more time for marking student responses. Teachers
need to invest a large amount of time to read and mark studentsÊ responses
to essay questions. On the other hand, relatively little or no time is required
for teachers to score objective test items like multiple-choice items and
matching exercises.
(d) As mentioned earlier, one of the strengths of essay questions is that they
provide students with authentic experiences because students are challenged
to construct rather than select their responses. To what extent does the short
time normally allotted to test affect student response? Students have
relatively little time to construct their responses and this time limit does not
allow them to give appropriate attention to the complex process of
organising, writing and reviewing their responses. In fact, in responding to
essay questions, students use a writing process that is quite different from
the typical process that produces excellent writing (draft, review, revise and
evaluate). In addition, students usually have no resources to aid their writing
when answering essay questions (dictionary or thesaurus). This
disadvantage may offset whatever advantage accrued from the fact that
responses to essay questions are more authentic than responses to multiple-
choice items.
(d) Essay Questions Benefit All Students by Placing Emphasis on the Importance
of Written Communication Skills
Written communication is a life competency that is required for effective and
successful performance in many vocations. Essay questions challenge
students to organise and express subject matter and problem solutions in
their own words, thereby giving them a chance to practise written
communication skills that will be helpful to them in future vocational
responsibilities. At the same time, the focus on written communication skills
is also a serious disadvantage for students who have marginal writing skills
but know the subject matter being assessed. If students who are
knowledgeable in the subject obtain low scores because of their inability to
write well, the validity of the test scores will be diminished.
SELF-CHECK 5.1
ACTIVITY 5.3
Compare the following two essay questions and decide which one
assesses higher-order thinking skills.
(a) „What are the major advantages and limitations of solar energy?‰
Here are specific guidelines that can help you improve existing essay questions
and create new ones.
(b) Avoid Using Essay Questions for Intended Learning Outcomes that are
Better Assessed with Other Kinds of Assessment
Some types of learning outcomes can be more efficiently and more reliably
assessed with objective tests than with essay questions. Since essay questions
sample a limited range of subject matter or content, are more time-
consuming to score and involve greater subjectivity in scoring, the use of
essay questions should be reserved for learning outcomes that cannot be
better assessed by some other means. Let us look at Example 5.1.
Example 5.1:
Learning Outcome:
To be able to differentiate the reproductive habits of birds and amphibians.
Essay Question:
What are the differences in egg laying characteristics between birds and
amphibians?
Objective Item:
Which of the following differences between birds and amphibians is correct?
Birds Amphibians
A Lay a few eggs at a time Lay many eggs at a time
B Lay eggs Give birth
C Do not incubate eggs Incubate eggs
D Lay eggs in nest Lay eggs on land
(i) The problem of student responses containing ideas that were not meant
to be assessed; and
Although more structure helps to avoid these problems, how much and what
kind of structure and focus to provide are dependent on the intended
learning outcome that is to be assessed by the essay question. The process of
writing effective essay questions involves defining the task and delimiting
the scope of the content in an effort to create an effective question that is
aligned with the intended learning outcome to be assessed by it (as
illustrated in Figure 5.1).
The verb is „evaluate‰, which is the task the student is supposed to do. The
scope of the question is the impact of the Industrial Revolution on England.
Very little guidance is given to students about the task of evaluating and the
scope of the task. A student reading the question may ask:
(ii) Evaluate based on what criteria? The significance of the revolution? The
quality of life in England? Progress in technological advancements?
(The task is not clear.)
SELF-CHECK 5.2
2. What is the difference between the task and the scope of an essay
question?
(e) Specify the Approximate Time Limit and Marks Allotted to Each Question
Specifying the approximate time limit helps students allocate their time in
answering several essay questions. Without such guidelines, students may
feel at a loss as to how much time to spend on a question. When deciding the
guidelines for how much time should be spent on a question, keep the slower
students and students with certain disabilities in mind. Also make sure that
students can be realistically expected to provide an adequate answer in the
given and/or suggested time. Similarly, state the marks allotted to each
question so that students can estimate how much they should write to
answer the question.
(f) Use Several Relatively Short Essay Questions Rather than One Long
Question
Only a very limited number of essay questions can be included in a test
because of the time it takes for students to respond to them and the time it
takes for teachers to grade the studentsÊ responses. This creates a challenge
with regard to designing valid essay questions. Shorter essay questions are
better suited to assess the depth of student learning within a subject, whereas
longer test essay questions are better suited to assess the breadth of student
learning within a subject. Hence, there is a trade-off when choosing between
several short essay questions or one long question. Focus on assessing the
(ii) Some questions are likely to be harder which could make the
comparative assessment of studentsÊ abilities unfair.
Last but not least, let us improve the essay questions through preview and review.
The following steps can help you improve the essay item before and after you
administer it to your students.
Before using the question in a test, ask a knowledgeable person in the subject
to critically review the essay question, the model answer and the intended
learning outcome to determine how well they are aligned with each other.
In addition, you can use a checklist as shown in Figure 5.2 to check your essay
questions.
SELF-CHECK 5.3
1. Why should you specify the time allotted for answering each
question?
Illustrate Use a word picture, a diagram, a Illustrate the use of catapults in the
chart or a concrete example to amphibious warfare of Alexander.
clarify a point.
Infer Draw a logical conclusion from What can you infer happened in the
presented information. experiment?
Interpret Give the meaning of; change from Interpret the poetic line, „The sound
one form of representation (such as of a cobweb snapping is the noise of
numerical) to another (such as my life.‰
verbal).
Justify Show good reasons for; give your Justify the American entry into the
evidence; present facts to support Second World War.
your position.
List Create a series of names or other List the major functions of the
items. human heart.
Predict Know or tell beforehand with Predict the outcome of a chemical
precision of calculation, reaction.
knowledge or shrewd inference
from facts or experience what will
happen.
Propose Offer for consideration, acceptance Propose a solution for landslides
or action; suggest. along the North-South Highway.
Recognise Locate knowledge in long-term Recognise the important events in
memory that is consistent with the road to independence in
presented material. Malaysia.
Recall Retrieve relevant knowledge from Recall the dates of important events
long-term memory. in Islamic history.
Summarise Sum up; give the main points Summarise the ways in which man
briefly. preserves food.
Trace Follow the course of; follow the Trace the development of television
trail of; give a description of in school instruction.
progress.
The definitions specify thought processes a person must perform to complete the
mental tasks. Note that this list is not exhaustive and local examples have been
introduced to illustrate the mental tasks required in each essay question.
ACTIVITY 5.4
(a) Select some essay questions in your subject area and examine
whether the verbs used are similar to those in the list given in
Table 5.1. Do you think the tasks required by the verbs used are
appropriate? Justify.
(b) Do you think students are able to differentiate between the tasks
required in the verbs listed? Justify.
(c) Are teachers able to describe to students the tasks required by using
these verbs? Explain.
(a) Checklist
In a checklist, a score is awarded for every correct or relevant point in a
response. The sum of these individual scores provides the final score of the
response. Table 5.2 is an example of a checklist.
(b) Rubric
The two most common approaches used in scoring rubrics are the holistic
and the analytic methods.
Then, points are written on each paper appropriate to the bin it is in. It
is based on an overall impression. The holistic method is also referred
to as global or impressionistic marking.
How best can a teacher use the holistic method in scoring studentsÊ
responses? Before he or she starts marking, the teacher can develop a
description of the type of response that would illustrate each category,
and then try out this draft version using several actual papers. After
reading and categorising all of the papers, it is a good idea to re-
examine the papers within a category to see if they are similar enough
in quality to receive the same points or grade. It may be faster to read
essays holistically and provide only an overall score or grade, but
students do not receive much feedback about their strengths and
weaknesses. Some instructors who use holistic scoring also write brief
comments on each paper to point out one or two strengths and/or
weaknesses so students will have a better idea of why their responses
received the scores they did.
The holistic scoring gives students a single, overall assessment score for
the response as a whole. The analytic scoring provides students with at
least a rating score for each criterion. For example, based on the rubric,
a studentÊs response may get 3 points for focus/organisation, 2 points
for elaboration and 4 points for mechanics, giving a total of 9 marks.
Table 5.5: Sample of a Marking Scheme Using the Weighted Analytic Method
(a) Grade the papers anonymously. This will help control the influence of our
expectations of the student on the evaluation of the answer.
(b) Read and score the answers to one question before going on to the next
question. In other words, score all the studentsÊ responses to Question 1
before looking at Question 2. This helps to keep one frame of reference and
one set of criteria in mind through all the papers, which results in more
consistent grading. It also prevents an impression that we form in reading
one question from carrying over to our reading of the studentÊs next answer.
(c) If a student has not done a good job on the first question, we may let this
impression influence our evaluation of the studentÊs second answer.
However, if other studentsÊ papers come in between, we are less likely to be
influenced by the original impression.
(d) If possible, try to grade all the answers to one particular question without
interruption. Our standards might vary from morning to night or one day to
the next.
(e) Shuffle all the papers after each item is scored. Changing the order of papers.
this way reduces the context effect and the possibility that a studentÊs score
may be the result of the location of the paper in relationship to other papers.
If RakeshÊs „B‰ work is always following JamalÊs „A‰ work, then it might
look more like „C‰ work and his grade would be lower than if his paper was
somewhere else in the stack.
(f) Decide in advance how you are going to handle extraneous factors and be
consistent in applying the rule. Students should be informed about how you
treat such things as misspelled words, neatness, handwriting, grammar and
so on.
(g) Be on the alert for bluffing. Some students who do not know the answer may
write a well-organised coherent essay but one containing material irrelevant
to the question. Decide how to treat irrelevant or inaccurate information
contained in the studentsÊ answers. We should not give credit for irrelevant
material. It is not fair to other students who may also have preferred to write
on another topic, but instead wrote on the required question.
(h) Write comments on the studentsÊ answers. Teacher comments make essay
tests a good learning experience for students. They also serve to refresh your
memory of your evaluation should the student question the grade given.
(i) Be aware of the order in which papers are marked which can have an impact
on the grades awarded. A marker may grow more critical (or more lenient)
after having read several papers, thus the early papers may receive lower (or
higher) marks than papers of similar quality that are scored later.
(j) Also, when students are directed to take a stand on a controversial issue, the
marker must be careful to ensure that the evidence and the way it is
presented is evaluated, not the position taken by the student. If the student
takes a position which differs from that of the marker, the marker must be
aware of his or her own possible bias in marking the essay.
ACTIVITY 5.4
There are two types of essays based on their function: restricted response and
extended response essay questions.
Essay questions have two variable elements ă the degree to which the task is
structured and the degree to which the scope of the content is focused.
Specifying the approximate time limit helps students allocate their time in
answering several essay questions.
Avoid using essay questions for intended learning outcomes that are better
assessed with other kinds of assessment.
Moss, A., & Holder, C. (1988). Improving student learning: A guidebook for faculty
in all disciplines. Dubuque, IO: Kendall/Hunt.
Phillips, J. A., Ansary Ahmed, & Kuldip Kaur. (2005). Instructional design
principles in the development of an e-learning graduate course. Paper
presented at The International Conference in E-Learning. Bangkok, Thailand.
Reiner, C. M., Bothell, T. W., Sudweeks, R. R., & Wood, B. (2002). Preparing
effective essay questions. Stillwater, OK: New Forums Press.