Anila 8602

1
EDUCATIONAL ASSESSMENT AND EVALUATION
ALLAMA IQBAL OPEN UNIVERSITY ISLAMABAD
ASSIGNMENT NO: 02
Submitted By Anila Zafar
Registration No 0000090833
Course Title Educational Assessment and Evaluation
Course Code 8602
Level B.ed( 1.5 years)
Semester Autumn 2021

2
Assignment No.2
(Units: 6-9)
Question No: 1 How the validity of a test can be measured?
Answer:
VALIDITY
The validity of an assessment tool is the degree to which it measures for what it is designed to measure.
For example if a test is designed to measure the skill of addition of three digit in mathematics but the
problems are presented in difficult language that is not according to the ability level of the students then it
may not measure the addition skill of three digits, consequently will not be a valid test. Many experts of
measurement had defined this term; some of the definitions are given as under. According to Business
Dictionary the “Validity is the degree to which an instrument, selection process, statistical technique, or
test measures what it is supposed to measure.”
Cook and Campbell (1979) define validity as the appropriateness or correctness of inferences, decisions,
or descriptions made about individuals, groups, or institutions from test results.
According to APA (American Psychological association) standards document the validity is the most
important consideration in test evaluation. The concept refers to the appropriateness, meaningfulness, and
usefulness of the specific inferences made from test scores. Test validation is the process of accumulating
evidence to support such inferences. Validity, however, is a unitary concept. Although evidence may be
accumulated in many ways, validity always refers to the degree to which that evidence supports the
inferences that are made from the scores. The inferences regarding specific uses of a test are validated, not
the test itself.
Howell’s (1992) view of validity of the test is; a valid test must measure specifically what it is intended to
measure. According to Mesick the validity is a matter of degree, not absolutely valid or absolutely invalid.
He advocates that, over time, validity evidence will continue to gather, either enhancing or contradicting
previous findings. Overall we can say that in terms of assessment, validity refers to the extent to which a
3
test's content is representative of the actual skills learned and whether the test can allow accurate
conclusions concerning achievement. Therefore validity is the extent to which a test measures what it
claims to measure. It is vital for a test to be valid in order for the results to be accurately applied and
interpreted.
Methods of Measuring Validity
Validity is the appropriateness of a particular uses of the test scores, test validation is then the process of
collecting evidence to justify the intended use of the scores. In order to collect the evidence of validity
there are many types of validity methods that provide usefulness of the assessment tools. Some of them
are listed below.
Content Validity
The evidence of the content validity is judgmental process and may be formal or informal. The formal
process has systematic procedure which arrives at a judgment. The important components are the
identification of behavioural objectives and construction of table of specification. Content validity
evidence involves the degree to which the content of the test matches a content domain associated with
the construct. For example, a test of the ability to add two numbers should include a range of
combinations of digits. A test with only one-digit numbers, or only even numbers, would not have good
coverage of the content domain. Content related evidence typically involves Subject Matter Experts
(SME's) evaluating test items against the test specifications. It is a non-statistical type of validity that
involves “the systematic examination of the test content to determine whether it covers a representative
sample of the behaviour domain to be measured”.
Curricular Validity
The extent to which the content of the test matches the objectives of a specific curriculum as it is formally
described. Curricular validity takes on particular importance in situations where tests are used for high-
stakes decisions, such as Punjab Examination Commission exams for fifth and eighth grade students and
Boards of Intermediate and Secondary Education Examinations. In these situations, curricular validity
4
means that the content of a test that is used to make a decision about whether a student should be
promoted to the next levels should measure the curriculum that the student is taught in schools. Curricular
validity is evaluated by groups of curriculum/content experts. The experts are asked to judge whether the
content of the test is parallel to the curriculum objectives and whether the test and curricular emphases are
in proper balance. Table of specification may help to improve the validity of the test.
Construct Validity
Before defining the construct validity, it seems necessary to elaborate the concept of construct. It is the
concept or the characteristic that a test is designed to measure. A construct provides the target that a
particular assessment or set of assessments is designed to measure; it is a separate entity from the test
itself. According to Howell (1992) Construct validity is a test’s ability to measure factors which are
relevant to the field of study. Construct validity is thus an assessment of the quality of an instrument or
experimental design. It says 'Does it measure the construct it is supposed to measure'. Construct validity is
rarely applied in achievement test.
Construct validity refers to the extent to which operationalizations of a construct (e.g. practical tests
developed from a theory) do actually measure what the theory says they do. Construct validity evidence
involves the empirical and theoretical support for the interpretation of the construct. Such lines of
evidence include statistical analyses of the internal structure of the test including the relationships
between responses to different test items. They also include relationships between the test and measures
of other constructs. As currently understood, construct validity is not distinct from the support for the
substantive theory of the construct that the test is designed to measure. As such, experiments designed to
reveal aspects of the causal role of the construct also contribute to construct validity evidence.
Construct validity occurs when the theoretical constructs of cause and effect accurately represent the real-
world situations they are intended to model. This is related to how well the experiment is operationalised.
A good experiment turns the theory (constructs) into actual things you can measure. Sometimes just
finding out more about the construct (which itself must be valid) can be helpful. The construct validity
5
addresses the construct that are mapped into the test items, it is also assured either by judgmental method
or by developing the test specification before the development of the test.
Criterion Validity
Criterion validity evidence involves the correlation between the test and a criterion variable (or variables)
taken as representative of the construct. In other words, it compares the test with other measures or
outcomes (the criteria) already held to be valid. If the test data and criterion data are collected at the same
time, this is referred to as concurrent validity evidence. If the test data is collected first in order to predict
criterion data collected at a later point in time, then this is referred to as predictive validity evidence.
Concurrent Validity
According to Howell (1992) “concurrent validity is determined using other existing and similar tests
which have been known to be valid as comparisons to a test being developed. There is no other known
valid test to measure the range of cultural issues tested for this specific group of subjects”. Concurrent
validity refers to the degree to which the scores taken at one point correlates with other measures (test,
observation or interview) of the same construct that is measured at the same time. Returning to the
selection test example, this would mean that the tests are administered to current employees and then
correlated with their scores on performance reviews. This measure the relationship between measures
made with existing tests. The existing test is thus the criterion. For example, a measure of creativity
should correlate with existing measures of creativity.
Predictive Validity
Predictive validity assures how well the test predicts some future behaviour of the examinee. It validity
refers to the degree to which the operationalizations can predict (or correlate with) other measures of the
same construct that are measured at some time in the future. Again, with the selection test example, this
would mean that the tests are administered to applicants, all applicants are hired, their performance is
reviewed at a later time, and then their scores on the two measures are correlated. This form of the
validity evidence is particularly useful and important for the aptitude tests, which attempt to predict how
6
well the test taker will do in some future setting.
This measures the extent to which a future level of a variable can be predicted from a current
measurement. This includes correlation with measurements made with different instruments. For example,
a political poll intends to measure future voting intent. College entry tests should have a high predictive
validity with regard to final exam results. When the two sets of scores are correlated, the coefficient that
results is called the predictive validity coefficient.
----------------------------------------------------Q#1 THE END-------------------------------------------------
Question No: 2 What are the rules of writing Multiple choice test items?
Answer:
Multiple Choice Questions (MCQ’s)
Norman E. Gronlund (1990) writes that the multiple choice question is probably the most popular as well
as the most widely applicable and effective type of objective test. Student selects a single response from a
list of options. It can be used effectively for any level of course outcome. It consists of two parts: the
stem, which states the problem and a list of three to five alternatives, one of which is the correct (key)
answer and the others are distracters (“foils” or incorrect options that draw the less knowledgeable pupil
away from the correct response). The stem may be stated as a direct question or as an incomplete
statement. For example:
Direct question
Which is the capital city of Pakistan? --------------- (Stem)
A. Lahore. ------------------------------------- (Distracter)

7
B. Karachi. ------------------------------------- (Distracter)
C. Islamabad. ---------------------------------- (Key)
D. Peshawar. ----------------------------------- (Distracter)
Incomplete Statement
The capital city of Pakistan is
A. Lahore.
B. Karachi.
C. Islamabad.
D. Peshawar.
RULES FOR WRITING MULTIPLE-CHOICE QUESTIONS
1. Use Plausible Distracters (wrong-response options)
 Only list plausible distracters, even if the number of options per question changes
 Write the options so they are homogeneous in content
 Use answers given in previous open-ended exams to provide realistic distracters
2. Use a Question Format
Experts encourage multiple-choice items to be prepared as questions (rather than incomplete statements)
Incomplete Statement Format:
The capital of AJK is in-----------------.
Direct Question Format:
In which of the following cities is the capital of AJK?
3. Emphasize Higher-Level Thinking
 Use memory-plus application questions. These questions require students to recall principles, rules or
facts in a real life context.
 The key to prepare memory-plus application questions is to place the concept in a life situation or
context that requires the student to first recall the facts and then apply or transfer the application of
8
those facts into a situation.
 Seek support from others who have experience writing higher-level thinking multiple-choice
questions.
4. Keep Option Lengths Similar
 Avoid making your correct answer the long or short answer
5. Balance the Placement of the Correct Answer
 Correct answers are usually the second and third option
6. Be Grammatically Correct
 Use simple, precise and unambiguous wording
 Students will be more likely to select the correct answer by finding the grammatically correct option
7. Avoid Clues to the Correct Answer
 Avoid answering one question in the test by giving the answer somewhere else in the test
 Have the test reviewed by someone who can find mistakes, clues, grammar and punctuation problems
before you administer the exam to students
 Avoid extremes – never, always, only
 Avoid nonsense words and unreasonable statements
8. Avoid Negative Questions
 31 of 35 testing experts recommend avoiding negative questions
 Students may be able to find an incorrect answer without knowing the correct answer
9. Use Only One Correct Option (Or be sure the best option is clearly the best option)
 The item should include one and only one correct or clearly best answer
 With one correct answer, alternatives should be mutually exclusive and not overlapping
10. Give Clear Instructions
Such as:
9
 Questions 1 - 10 are multiple-choice questions designed to assess your ability to remember or recall
basic and foundational pieces of knowledge related to this course.
 Please read each question carefully before reading the answer options. When you have a clear idea of
the question, find your answer and mark your selection on the answer sheet. Please do not make any
marks on this exam.
11. Use Only a Single, Clearly-Defined Problem and Include the Main Idea in the Question
12. Avoid “All the Above” Option
13. Avoid the “None of the Above” Option
14. Don’t Use MCQ When Other Item Types Are More Appropriate
Advantages
 Quick and easy to score, by hand or electronically
 Can be written so that they test a wide range of higher-order thinking skills
 Can cover lots of content areas on a single exam and still be answered in a class period
Disadvantages
 Often test literacy skills: “if the student reads the question carefully, the answer is easy to recognize
even if the student knows little about the subject”.
 Provide unprepared students the opportunity to guess, and with guesses that are right, they get credit
for things they don’t know
 Expose students to misinformation that can influence subsequent thinking about the content
 Take time and skill to construct (especially good questions).
---------------------------------------------------Q#2 THE END----------------------------------------------------

10
Question No: 3 Write a detailed note on scale of measurement.
Answer:
MEASUREMENT
Measurement is the assignment of numbers to objects or events in a systematic fashion. Measurement
scales are critical because they relate to the types of statistics you can use to analyze your data. An easy
way to have a paper rejected is to have used either an incorrect scale/statistic combination or to have used
a low powered statistic on a high powered set of data. Following four levels of measurement scales are
commonly distinguished so that the proper analysis can be used on the data a number can be used merely
to label or categorize a response.
Scales of measurement is how variables are defined and categorised. Psychologist Stanley
Stevens developed the four common scales of measurement: nominal, ordinal, interval and ratio. Each
scale of measurement has properties that determine how to properly analyse the data. The properties
evaluated are identity, magnitude, equal intervals and a minimum value of zero.
Properties of Measurement
 Identity: Identity refers to each value having a unique meaning.
 Magnitude: Magnitude means that the values have an ordered relationship to one another, so there is a
specific order to the variables.
 Equal intervals: Equal intervals mean that data points along the scale are equal, so the difference
between data points one and two will be the same as the difference between data points five and six.
 A minimum value of zero: A minimum value of zero means the scale has a true zero point. Degrees,
for example, can fall below zero and still have meaning. But if you weigh nothing, you don’t exist.
THE FOUR SCALES OF MEASUREMENT
By understanding the scale of the measurement of their data, data scientists can determine the kind of
statistical test to perform.

11
1. Nominal Scale
The nominal scale of measurement defines the identity property of data. This scale has certain
characteristics, but doesn’t have any form of numerical meaning. The data can be placed into categories
but can’t be multiplied, divided, added or subtracted from one another. It’s also not possible to measure
the difference between data points.
Nominal scales are the lowest scales of measurement. A nominal scale, as the name implies, is simply
some placing of data into categories, without any order or structure. You are only allowed to examine if a
nominal scale datum is equal to some particular value or to count the number of occurrences of each
value. For example, categorization of blood groups of classmates into A, B. AB, O etc. In The only
mathematical operation we can perform with nominal data is to count. Variables assessed on a nominal
scale are called categorical variables; Categorical data are measured on nominal scales which merely
assign labels to distinguish categories. For example, gender is a nominal scale variable. Classifying
people according to gender is a common application of a nominal scale.
Nominal Data
Nominal data can be broken down again into three categories:
 Nominal with order: Some nominal data can be sub-categorised in order, such as “cold, warm, hot
and very hot.”
 Nominal without order: Nominal data can also be sub-categorised as nominal without order, such as
male and female.
 Dichotomous: Dichotomous data is defined by having only two categories or levels, such as “yes’
and ‘no’.
2. Ordinal Scale
The ordinal scale defines data that is placed in a specific order. While each value is ranked, there’s no
information that specifies what differentiates the categories from each other. These values can’t be added
12
to or subtracted from.
Something measured on an "ordinal" scale does have an evaluative connotation. You are also allowed to
examine if an ordinal scale datum is less than or greater than another value. For example rating of job
satisfaction on a scale from 1 to 10, with 10 representing complete satisfaction. With ordinal scales, we
only know that 2 is better than 1 or 10 is better than 9; we do not know by how much. It may vary. Hence,
you can 'rank' ordinal data, but you cannot 'quantify' differences between two ordinal values. Nominal
scale properties are included in ordinal scale.
3. Interval Scale
The interval scale contains properties of nominal and ordered data, but the difference between data points
can be quantified. This type of data shows both the order of the variables and the exact differences
between the variables. They can be added to or subtracted from each other, but not multiplied or divided.
An ordinal scale has quantifiable difference between values become interval scale. You are allowed to
quantify the difference between two interval scale values but there is no natural zero. A variable measured
on an interval scale gives information about more or better as ordinal scales do, but interval variables have
an equal distance between each value. The distance between 1 and 2 is equal to the distance between 9
and 10.
For example, temperature scales are interval data with 25C warmer than 20C and a 5C difference has
some physical meaning. Note that 0C is arbitrary, so that it does not make sense to say that 20C is twice
as hot as 10C but there is the exact same difference between 100C and 90C as there is between 42C and
32C. Students’ achievement scores are measured on interval scale.
4. Ratio scale of measurement
Something measured on a ratio scale has the same properties that an interval scale has except, with a ratio
scaling, there is an absolute zero point. Ratio scales of measurement include properties from all four
scales of measurement. The data is nominal and defined by an identity, can be classified in order, contains
13
intervals and can be broken down into exact value. Weight, height and distance are all examples of ratio
variables. Data in the ratio scale can be added, subtracted, divided and multiplied.
Ratio scales also differ from interval scales in that the scale has a ‘true zero’. The number zero means that
the data has no value point. An example of this is height or weight, as someone cannot be zero
centimeters tall or weigh zero kilos – or be negative centimeters or negative kilos. Examples of the use of
this scale are calculating shares or sales. Of all types of data on the scales of measurement, data scientists
can do the most with ratio data points.
To summarize, nominal scales are used to label or describe values. Ordinal scales are used to provide
information about the specific order of the data points, mostly seen in the use of satisfaction surveys. The
interval scale is used to understand the order and differences between them. The ratio scales gives more
information about identity, order and difference, plus a breakdown of the numerical detail within each
data point.
------------------------------------------------------Q#3 THE END------------------------------------------------
Question No: 4 What are the considerations in conducting parent-teacher conferences?
Answer:
Parent-Teacher Conferences
Parent-teacher conferences are mostly used in elementary schools. In such conferences portfolio are
discussed. This is a two-way flow of information and provides much information to the parents. But one
of the limitations is that many parents don’t come to attend the conferences. It is also a time consuming
activity and also needs sufficient funds to hold conferences.
Literature also highlights ‘parent-student-teacher conference’ instead ‘parent-teacher conference’, as
student is also one of the key components of this process since he/she is directly benefitted. In many
14
developed countries, it has become the most important way of informing parents about their children’s
work in school. Parent-teacher conferences are productive when these are carefully planned and the
teachers are skilled and committed.
The parent-teacher conference is an extremely useful tool, but it shares three important limitations with
informal letter. First, it requires a substantial amount of time and skills. Second, it does not provide a
systematic record of student’s progress. Third, some parents are unwilling to attend conferences, and they
can’t be enforced. Parent-student-teacher conferences are frequently convened in many states of the USA
and some other advanced countries. In the US, this has become a striking feature of Charter Schools.
Some schools rely more on parent conferences than written reports for conveying the richness of how
students are doing or performing. In such cases, a school sometimes provides a narrative account of
student’s accomplishments and status to augment the parent conferences.
Parent-teacher conferences have been an important part of educating generations of K–12 students.
Successful conferences require both parties to listen respectfully to what each has to say, and they’re a
proven way to help students.
A parent-teacher conference is a breeze when the student is doing well, but teachers are often in the
difficult position of explaining to parents that their child is struggling with some aspect of schoolwork.
This can trigger a wide range of responses. Some parents will blame their child, while others will blame
themselves or feel overwhelmed or inadequate.
Teachers need to assure parents that the conference isn’t about assigning blame. All parents can
encourage their children and advocate for them, including those who can’t provide much help with their
children’s homework.
The primary purpose of a parent-teacher conference is for the teacher to brief the parents on the child’s
academic progress and share anything notable about the child’s behavior and development at school.
It’s helpful to let parents know if their child is attentive in class, participates in discussions, or has some
potential that might only be obvious in the classroom. This is a great time to give parents suggestions
15
about how they can help their child and to tell them about any additional resources available — like
enrichment and remedial programs — that could be helpful.
While the student’s academic progress is the main focus of parent-teacher conferences, the sessions are
much more than simply an opportunity to tell parents how their child can improve their grades.
Parent-teacher conferences, which are typically held only once per semester and seldom last more than 30
minutes, are also a rare opportunity for the teacher to learn from the parents about the child’s patterns for
doing school work, reading, and preparing for tests. As a teacher, you need to listen closely when parents
answer questions about anything that might affect the student’s academic performance.
When you’re assessing the student’s academic performance, you want to get as much information as
possible to determine whether they need help and what kind. It’s important to discuss factors outside the
classroom that could influence a student’s behavior, classroom focus, motivation, and their relationships
with schoolmates.
Your discussion with parents can touch on the student’s home life, family dynamics, or family finances.
While they’re sometimes awkward, frank discussions are necessary for the teacher to understand the
student’s challenges and for the parents to learn how they can help their child reach their academic
potential. It may be valuable to include school staff — such as counselors — in your meeting with the
parents. While they’re a lot of work, the most effective parent-teacher conferences boost family
involvement — and that promotes positive outcomes.
Considerations in Parent-Teacher Conference
Parent-teacher conferences are not just for parents who have concerns about their child in the classroom.
These meetings provide an opportunity for parents to partner with teachers in providing a good learning
experience for children.
Here’s how to strengthen that partnership at your child’s parent-teacher conference:
1. Prepare for the conference. Keep a folder with tests, papers, or any other topics you want to address.
Do your research by talking to your child; find out how she is doing in class, what’s happening at lunch
16
and recess.
2. Respect the teacher’s time. Be prompt, try not to bring other children and do not answer your cell
phone during the meeting!
3. Begin with a positive attitude. Start with a compliment for the teacher before addressing concerns.
Starting with a negative comment could put the teacher on the defensive.
4. Work together. Purpose to partner with your teacher by being a team player. Approach issues
the teacher raises with the attitude of "We have to work together on this problem."
5. Remember that this is about your child, not you. The teacher is not trying to tell you how to parent.
He is only there to help your child. No need to get defensive.
6. Listen and be open minded. A conference is for teachers to give parents information based on
classroom observation. The teacher would not bring up an issue if it was not necessary.
7. Teachers are not the enemy. If you have a problem, no need to go on the attack. They care about your
child and want to help you and your child resolve the issue.
8. Ask questions. After you’ve listened, it’s your turn to ask questions. Make a list beforehand so you’ll
remember what’s important.
9. Learn the communication protocol. Ask the teacher her preferred method of communication: phone,
email, or notes. Let the teacher know you want to be an available parent.
10. Ask how you can be involved. Whether you work full-time, part-time or not at all, involvement is
about being aware of what's happening with your child's education, not just about volunteering in the
classroom.
--------------------------------------------------Q#4 THE END----------------------------------------------------

17
Question No: 5 Write a note on advantages and disadvantages of criterion reference testing.
Answer:
CRITERION-REFERENCED TEST
A criterion-referenced test is designed to measure how well test takers have mastered a particular body of
knowledge. The term "criterion- referenced test" is not part of the everyday vocabulary in schools, and
yet, nearly all students take criterion-referenced tests on a routine basis. These tests generally have an
established "passing" score. Students know what the passing score is and an individual's test score is
determined by knowledge of the course material.
The criterion-referenced test definition states that this type of assessment compares a student’s academic
achievement to a set of criteria or standards. This norm or criteria is established before candidates begin
the test.
Usually, schools or districts set the standard as a percentage. The test-taker’s score shows how far they’ve
progressed toward the approved standard. If they miss the mark, they must work harder.
A good example is measuring your body temperature. The accepted normal level is 98.6 degrees
Fahrenheit. If your temperature is too high in comparison, you are running a fever.
Criterion-referenced evaluations are used in schools to examine specific knowledge and abilities that
students have most likely gained. This determines how close they are to mastering a standard. They allow
teachers to assess how they can assist students improve in specific areas. Criterion-referenced
evaluations will show you where your learners are in terms of an accepted standard, allowing you to tailor
instructions and assistance for students. Criterion-referenced assessment examples include driving tests,
end-of-unit exams in school, clinical skill competency tools, etc.
It is important to distinguish between criterion-referenced tests and norm-referenced tests. The
standardized tests used to measure how well an individual does relative to other people who have taken
the test are norm-referenced. Criterion-referenced assessment examples include driving tests, end-of-unit
exams in school, clinical skill competency tools, etc.

18
TYPES OF CRITERION-REFERENCED TESTS
Criterion-referenced tests are mainly of the following types:
1. QUESTIONNAIRES AND SURVEYS
These could be about the following: the number of children served, the number of children handled by
their respective language groups, or the usual level of schooling of parents. Expected replies are on a scale
of 1 to 5 on an observation form or survey, etc. This information can be scored and examined.
2. MULTIPLE-CHOICE QUESTIONS
In this type of criterion-referenced test, multiple choices follow a single question. There is only one
answer and the scores depend on the number of correct answers chosen.
3. TRUE OR FALSE QUESTIONS
In this format, a given sentence can either be true or false. The student might be asked to select the correct
statement or the false statement, or state whether the given statement is true or false.
4. OPEN-ENDED QUESTIONS
In this, the student may be asked to write a short answer or an essay or summarize a passage. It may also
include a combination of different question types.
ADVANTAGES OF CRITERION REFRENCE TEST
Mastery of Subject Matter.
Criterion-referenced tests are more suitable than norm-referenced tests for tracking the progress of
students within a curriculum. Test items can be designed to match specific program objectives. The scores
on a criterion referenced test indicate how well the individual can correctly answer questions on the
material being studied, while the scores on a norm-referenced test report how the student scored relative
to other students in the group.
Criterion-Referenced Tests can be Managed Locally.
Assessing student progress is something that every teacher must do. Criterion-referenced tests can be
19
developed at the classroom level. If the standards are not met, teachers can specifically diagnose the
deficiencies. Scores for an individual student are independent of how other students perform. In addition,
test results can be quickly obtained to give students effective feedback on their performance. Although
norm-referenced tests are most suitable for developing normative data across large groups, criterion-
referenced tests can produce some local norms.
Criterion-referenced assessments are needs based, meaning the tests are created with what the students’
needs are. If a student really needs to improve their knowledge of proper nouns, then a test will be created
on proper nouns.
When discussing the advantages of criterion referenced tests, it is also important to mention that since
students are only judged against themselves, they have a better chance of scoring high, which will help
improve their self-esteem as well. Studies show that students with special needs tend to have lower self-
esteem. Any way that we can help students feel better about themselves is a great opportunity.
One thing to remember is that each student is an individual and is different. By using criterion-referenced
assessments in your classroom, you can meet the individual needs of the students and differentiate your
assessments with the sole purpose of helping the students achieve to their fullest potential.
Disadvantages of Criterion-Referenced Tests
Criterion-referenced tests have some built-in disadvantages. Creating tests that are both valid and reliable
requires fairly extensive and expensive time and effort. In addition, results cannot be generalized beyond
the specific course or program. Such tests may also be compromised by students gaining access to test
questions prior to exams. Criterion-referenced tests are specific to a program and cannot be used to
measure the performance of large groups.
Analyzing Test Items
Item analysis is used to measure the effectiveness of individual test items. The main purpose is to improve
tests, to identify questions that are too easy, too difficult or too susceptible to guessing. While test items
can be analyzed on both criterion-referenced and norm-referenced tests, the analysis is somewhat different
20
because the purpose of the two types of tests is different.
Items on norm-referenced tests need to discriminate between high and low performers because those tests
are generally used to make aptitude, proficiency or placement decisions. Criterion-referenced tests, in
contrast, are used to measure mastery of specific material and the goal is success for all students. The best
items on criterion-referenced tests are those that tap the important concepts.
----------------------------------------------------Q#5 THE END-----------------------------------------------------

Anila 8602

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Anila 8602

Uploaded by

Copyright:

Available Formats

1

EDUCATIONAL ASSESSMENT AND EVALUATION

ALLAMA IQBAL OPEN UNIVERSITY ISLAMABAD

Submitted By Anila Zafar

Course Title Educational Assessment and Evaluation

Course Code 8602

Level B.ed( 1.5 years)

Semester Autumn 2021

Question No: 1 How the validity of a test can be measured?

test measures what it is supposed to measure.”

or descriptions made about individuals, groups, or institutions from test results.

the test itself.

Methods of Measuring Validity

are listed below.

identification of behavioural objectives and construction of table of specification. Content validity

sample of the behaviour domain to be measured”.

rarely applied in achievement test.

or by developing the test specification before the development of the test.

should correlate with existing measures of creativity.

well the test taker will do in some future setting.

results is called the predictive validity coefficient.

----------------------------------------------------Q#1 THE END-------------------------------------------------

Multiple Choice Questions (MCQ’s)

statement. For example:

Which is the capital city of Pakistan? --------------- (Stem)

A. Lahore. ------------------------------------- (Distracter)

B. Karachi. ------------------------------------- (Distracter)

C. Islamabad. ---------------------------------- (Key)

D. Peshawar. ----------------------------------- (Distracter)

The capital city of Pakistan is

RULES FOR WRITING MULTIPLE-CHOICE QUESTIONS

1. Use Plausible Distracters (wrong-response options)

 Write the options so they are homogeneous in content

 Use answers given in previous open-ended exams to provide realistic distracters

2. Use a Question Format

Incomplete Statement Format:

The capital of AJK is in-----------------.

Direct Question Format:

In which of the following cities is the capital of AJK?

3. Emphasize Higher-Level Thinking

facts in a real life context.

those facts into a situation.

4. Keep Option Lengths Similar

 Avoid making your correct answer the long or short answer

5. Balance the Placement of the Correct Answer

 Correct answers are usually the second and third option

 Use simple, precise and unambiguous wording

7. Avoid Clues to the Correct Answer

before you administer the exam to students

 Avoid extremes – never, always, only

 Avoid nonsense words and unreasonable statements

8. Avoid Negative Questions

 31 of 35 testing experts recommend avoiding negative questions

10. Give Clear Instructions

basic and foundational pieces of knowledge related to this course.

marks on this exam.

12. Avoid “All the Above” Option

13. Avoid the “None of the Above” Option

 Quick and easy to score, by hand or electronically

even if the student knows little about the subject”.

for things they don’t know

 Take time and skill to construct (especially good questions).