Professional Documents
Culture Documents
Anila 8602
Anila 8602
ASSIGNMENT NO: 02
Registration No 0000090833
Assignment No.2
(Units: 6-9)
Answer:
VALIDITY
The validity of an assessment tool is the degree to which it measures for what it is designed to measure.
For example if a test is designed to measure the skill of addition of three digit in mathematics but the
problems are presented in difficult language that is not according to the ability level of the students then it
may not measure the addition skill of three digits, consequently will not be a valid test. Many experts of
measurement had defined this term; some of the definitions are given as under. According to Business
Dictionary the “Validity is the degree to which an instrument, selection process, statistical technique, or
Cook and Campbell (1979) define validity as the appropriateness or correctness of inferences, decisions,
According to APA (American Psychological association) standards document the validity is the most
important consideration in test evaluation. The concept refers to the appropriateness, meaningfulness, and
usefulness of the specific inferences made from test scores. Test validation is the process of accumulating
evidence to support such inferences. Validity, however, is a unitary concept. Although evidence may be
accumulated in many ways, validity always refers to the degree to which that evidence supports the
inferences that are made from the scores. The inferences regarding specific uses of a test are validated, not
Howell’s (1992) view of validity of the test is; a valid test must measure specifically what it is intended to
measure. According to Mesick the validity is a matter of degree, not absolutely valid or absolutely invalid.
He advocates that, over time, validity evidence will continue to gather, either enhancing or contradicting
previous findings. Overall we can say that in terms of assessment, validity refers to the extent to which a
3
EDUCATIONAL ASSESSMENT AND EVALUATION
test's content is representative of the actual skills learned and whether the test can allow accurate
conclusions concerning achievement. Therefore validity is the extent to which a test measures what it
claims to measure. It is vital for a test to be valid in order for the results to be accurately applied and
interpreted.
Validity is the appropriateness of a particular uses of the test scores, test validation is then the process of
collecting evidence to justify the intended use of the scores. In order to collect the evidence of validity
there are many types of validity methods that provide usefulness of the assessment tools. Some of them
Content Validity
The evidence of the content validity is judgmental process and may be formal or informal. The formal
process has systematic procedure which arrives at a judgment. The important components are the
evidence involves the degree to which the content of the test matches a content domain associated with
the construct. For example, a test of the ability to add two numbers should include a range of
combinations of digits. A test with only one-digit numbers, or only even numbers, would not have good
coverage of the content domain. Content related evidence typically involves Subject Matter Experts
(SME's) evaluating test items against the test specifications. It is a non-statistical type of validity that
involves “the systematic examination of the test content to determine whether it covers a representative
Curricular Validity
The extent to which the content of the test matches the objectives of a specific curriculum as it is formally
described. Curricular validity takes on particular importance in situations where tests are used for high-
stakes decisions, such as Punjab Examination Commission exams for fifth and eighth grade students and
Boards of Intermediate and Secondary Education Examinations. In these situations, curricular validity
4
EDUCATIONAL ASSESSMENT AND EVALUATION
means that the content of a test that is used to make a decision about whether a student should be
promoted to the next levels should measure the curriculum that the student is taught in schools. Curricular
validity is evaluated by groups of curriculum/content experts. The experts are asked to judge whether the
content of the test is parallel to the curriculum objectives and whether the test and curricular emphases are
in proper balance. Table of specification may help to improve the validity of the test.
Construct Validity
Before defining the construct validity, it seems necessary to elaborate the concept of construct. It is the
concept or the characteristic that a test is designed to measure. A construct provides the target that a
particular assessment or set of assessments is designed to measure; it is a separate entity from the test
itself. According to Howell (1992) Construct validity is a test’s ability to measure factors which are
relevant to the field of study. Construct validity is thus an assessment of the quality of an instrument or
experimental design. It says 'Does it measure the construct it is supposed to measure'. Construct validity is
Construct validity refers to the extent to which operationalizations of a construct (e.g. practical tests
developed from a theory) do actually measure what the theory says they do. Construct validity evidence
involves the empirical and theoretical support for the interpretation of the construct. Such lines of
evidence include statistical analyses of the internal structure of the test including the relationships
between responses to different test items. They also include relationships between the test and measures
of other constructs. As currently understood, construct validity is not distinct from the support for the
substantive theory of the construct that the test is designed to measure. As such, experiments designed to
reveal aspects of the causal role of the construct also contribute to construct validity evidence.
Construct validity occurs when the theoretical constructs of cause and effect accurately represent the real-
world situations they are intended to model. This is related to how well the experiment is operationalised.
A good experiment turns the theory (constructs) into actual things you can measure. Sometimes just
finding out more about the construct (which itself must be valid) can be helpful. The construct validity
5
EDUCATIONAL ASSESSMENT AND EVALUATION
addresses the construct that are mapped into the test items, it is also assured either by judgmental method
Criterion Validity
Criterion validity evidence involves the correlation between the test and a criterion variable (or variables)
taken as representative of the construct. In other words, it compares the test with other measures or
outcomes (the criteria) already held to be valid. If the test data and criterion data are collected at the same
time, this is referred to as concurrent validity evidence. If the test data is collected first in order to predict
criterion data collected at a later point in time, then this is referred to as predictive validity evidence.
Concurrent Validity
According to Howell (1992) “concurrent validity is determined using other existing and similar tests
which have been known to be valid as comparisons to a test being developed. There is no other known
valid test to measure the range of cultural issues tested for this specific group of subjects”. Concurrent
validity refers to the degree to which the scores taken at one point correlates with other measures (test,
observation or interview) of the same construct that is measured at the same time. Returning to the
selection test example, this would mean that the tests are administered to current employees and then
correlated with their scores on performance reviews. This measure the relationship between measures
made with existing tests. The existing test is thus the criterion. For example, a measure of creativity
Predictive Validity
Predictive validity assures how well the test predicts some future behaviour of the examinee. It validity
refers to the degree to which the operationalizations can predict (or correlate with) other measures of the
same construct that are measured at some time in the future. Again, with the selection test example, this
would mean that the tests are administered to applicants, all applicants are hired, their performance is
reviewed at a later time, and then their scores on the two measures are correlated. This form of the
validity evidence is particularly useful and important for the aptitude tests, which attempt to predict how
6
EDUCATIONAL ASSESSMENT AND EVALUATION
This measures the extent to which a future level of a variable can be predicted from a current
measurement. This includes correlation with measurements made with different instruments. For example,
a political poll intends to measure future voting intent. College entry tests should have a high predictive
validity with regard to final exam results. When the two sets of scores are correlated, the coefficient that
Question No: 2 What are the rules of writing Multiple choice test items?
Answer:
Norman E. Gronlund (1990) writes that the multiple choice question is probably the most popular as well
as the most widely applicable and effective type of objective test. Student selects a single response from a
list of options. It can be used effectively for any level of course outcome. It consists of two parts: the
stem, which states the problem and a list of three to five alternatives, one of which is the correct (key)
answer and the others are distracters (“foils” or incorrect options that draw the less knowledgeable pupil
away from the correct response). The stem may be stated as a direct question or as an incomplete
Direct question
Incomplete Statement
A. Lahore.
B. Karachi.
C. Islamabad.
D. Peshawar.
Only list plausible distracters, even if the number of options per question changes
Experts encourage multiple-choice items to be prepared as questions (rather than incomplete statements)
Use memory-plus application questions. These questions require students to recall principles, rules or
The key to prepare memory-plus application questions is to place the concept in a life situation or
context that requires the student to first recall the facts and then apply or transfer the application of
8
EDUCATIONAL ASSESSMENT AND EVALUATION
Seek support from others who have experience writing higher-level thinking multiple-choice
questions.
6. Be Grammatically Correct
Students will be more likely to select the correct answer by finding the grammatically correct option
Avoid answering one question in the test by giving the answer somewhere else in the test
Have the test reviewed by someone who can find mistakes, clues, grammar and punctuation problems
Students may be able to find an incorrect answer without knowing the correct answer
9. Use Only One Correct Option (Or be sure the best option is clearly the best option)
The item should include one and only one correct or clearly best answer
With one correct answer, alternatives should be mutually exclusive and not overlapping
Such as:
9
EDUCATIONAL ASSESSMENT AND EVALUATION
Questions 1 - 10 are multiple-choice questions designed to assess your ability to remember or recall
Please read each question carefully before reading the answer options. When you have a clear idea of
the question, find your answer and mark your selection on the answer sheet. Please do not make any
11. Use Only a Single, Clearly-Defined Problem and Include the Main Idea in the Question
14. Don’t Use MCQ When Other Item Types Are More Appropriate
Advantages
Can be written so that they test a wide range of higher-order thinking skills
Can cover lots of content areas on a single exam and still be answered in a class period
Disadvantages
Often test literacy skills: “if the student reads the question carefully, the answer is easy to recognize
Provide unprepared students the opportunity to guess, and with guesses that are right, they get credit
Expose students to misinformation that can influence subsequent thinking about the content
Answer:
MEASUREMENT
scales are critical because they relate to the types of statistics you can use to analyze your data. An easy
way to have a paper rejected is to have used either an incorrect scale/statistic combination or to have used
a low powered statistic on a high powered set of data. Following four levels of measurement scales are
commonly distinguished so that the proper analysis can be used on the data a number can be used merely
scale of measurement has properties that determine how to properly analyse the data. The properties
Properties of Measurement
Magnitude: Magnitude means that the values have an ordered relationship to one another, so there is a
Equal intervals: Equal intervals mean that data points along the scale are equal, so the difference
between data points one and two will be the same as the difference between data points five and six.
A minimum value of zero: A minimum value of zero means the scale has a true zero point. Degrees,
for example, can fall below zero and still have meaning. But if you weigh nothing, you don’t exist.
By understanding the scale of the measurement of their data, data scientists can determine the kind of
1. Nominal Scale
The nominal scale of measurement defines the identity property of data. This scale has certain
characteristics, but doesn’t have any form of numerical meaning. The data can be placed into categories
but can’t be multiplied, divided, added or subtracted from one another. It’s also not possible to measure
Nominal scales are the lowest scales of measurement. A nominal scale, as the name implies, is simply
some placing of data into categories, without any order or structure. You are only allowed to examine if a
nominal scale datum is equal to some particular value or to count the number of occurrences of each
value. For example, categorization of blood groups of classmates into A, B. AB, O etc. In The only
mathematical operation we can perform with nominal data is to count. Variables assessed on a nominal
scale are called categorical variables; Categorical data are measured on nominal scales which merely
assign labels to distinguish categories. For example, gender is a nominal scale variable. Classifying
Nominal Data
Nominal with order: Some nominal data can be sub-categorised in order, such as “cold, warm, hot
Nominal without order: Nominal data can also be sub-categorised as nominal without order, such as
Dichotomous: Dichotomous data is defined by having only two categories or levels, such as “yes’
and ‘no’.
2. Ordinal Scale
The ordinal scale defines data that is placed in a specific order. While each value is ranked, there’s no
information that specifies what differentiates the categories from each other. These values can’t be added
12
EDUCATIONAL ASSESSMENT AND EVALUATION
to or subtracted from.
Something measured on an "ordinal" scale does have an evaluative connotation. You are also allowed to
examine if an ordinal scale datum is less than or greater than another value. For example rating of job
satisfaction on a scale from 1 to 10, with 10 representing complete satisfaction. With ordinal scales, we
only know that 2 is better than 1 or 10 is better than 9; we do not know by how much. It may vary. Hence,
you can 'rank' ordinal data, but you cannot 'quantify' differences between two ordinal values. Nominal
3. Interval Scale
The interval scale contains properties of nominal and ordered data, but the difference between data points
can be quantified. This type of data shows both the order of the variables and the exact differences
between the variables. They can be added to or subtracted from each other, but not multiplied or divided.
An ordinal scale has quantifiable difference between values become interval scale. You are allowed to
quantify the difference between two interval scale values but there is no natural zero. A variable measured
on an interval scale gives information about more or better as ordinal scales do, but interval variables have
an equal distance between each value. The distance between 1 and 2 is equal to the distance between 9
and 10.
For example, temperature scales are interval data with 25C warmer than 20C and a 5C difference has
some physical meaning. Note that 0C is arbitrary, so that it does not make sense to say that 20C is twice
as hot as 10C but there is the exact same difference between 100C and 90C as there is between 42C and
Something measured on a ratio scale has the same properties that an interval scale has except, with a ratio
scaling, there is an absolute zero point. Ratio scales of measurement include properties from all four
scales of measurement. The data is nominal and defined by an identity, can be classified in order, contains
13
EDUCATIONAL ASSESSMENT AND EVALUATION
intervals and can be broken down into exact value. Weight, height and distance are all examples of ratio
variables. Data in the ratio scale can be added, subtracted, divided and multiplied.
Ratio scales also differ from interval scales in that the scale has a ‘true zero’. The number zero means that
the data has no value point. An example of this is height or weight, as someone cannot be zero
centimeters tall or weigh zero kilos – or be negative centimeters or negative kilos. Examples of the use of
this scale are calculating shares or sales. Of all types of data on the scales of measurement, data scientists
To summarize, nominal scales are used to label or describe values. Ordinal scales are used to provide
information about the specific order of the data points, mostly seen in the use of satisfaction surveys. The
interval scale is used to understand the order and differences between them. The ratio scales gives more
information about identity, order and difference, plus a breakdown of the numerical detail within each
data point.
Answer:
Parent-Teacher Conferences
Parent-teacher conferences are mostly used in elementary schools. In such conferences portfolio are
discussed. This is a two-way flow of information and provides much information to the parents. But one
of the limitations is that many parents don’t come to attend the conferences. It is also a time consuming
student is also one of the key components of this process since he/she is directly benefitted. In many
14
EDUCATIONAL ASSESSMENT AND EVALUATION
developed countries, it has become the most important way of informing parents about their children’s
work in school. Parent-teacher conferences are productive when these are carefully planned and the
The parent-teacher conference is an extremely useful tool, but it shares three important limitations with
informal letter. First, it requires a substantial amount of time and skills. Second, it does not provide a
systematic record of student’s progress. Third, some parents are unwilling to attend conferences, and they
can’t be enforced. Parent-student-teacher conferences are frequently convened in many states of the USA
and some other advanced countries. In the US, this has become a striking feature of Charter Schools.
Some schools rely more on parent conferences than written reports for conveying the richness of how
students are doing or performing. In such cases, a school sometimes provides a narrative account of
Parent-teacher conferences have been an important part of educating generations of K–12 students.
Successful conferences require both parties to listen respectfully to what each has to say, and they’re a
A parent-teacher conference is a breeze when the student is doing well, but teachers are often in the
difficult position of explaining to parents that their child is struggling with some aspect of schoolwork.
This can trigger a wide range of responses. Some parents will blame their child, while others will blame
Teachers need to assure parents that the conference isn’t about assigning blame. All parents can
encourage their children and advocate for them, including those who can’t provide much help with their
children’s homework.
The primary purpose of a parent-teacher conference is for the teacher to brief the parents on the child’s
academic progress and share anything notable about the child’s behavior and development at school.
It’s helpful to let parents know if their child is attentive in class, participates in discussions, or has some
potential that might only be obvious in the classroom. This is a great time to give parents suggestions
15
EDUCATIONAL ASSESSMENT AND EVALUATION
about how they can help their child and to tell them about any additional resources available — like
While the student’s academic progress is the main focus of parent-teacher conferences, the sessions are
much more than simply an opportunity to tell parents how their child can improve their grades.
Parent-teacher conferences, which are typically held only once per semester and seldom last more than 30
minutes, are also a rare opportunity for the teacher to learn from the parents about the child’s patterns for
doing school work, reading, and preparing for tests. As a teacher, you need to listen closely when parents
answer questions about anything that might affect the student’s academic performance.
When you’re assessing the student’s academic performance, you want to get as much information as
possible to determine whether they need help and what kind. It’s important to discuss factors outside the
classroom that could influence a student’s behavior, classroom focus, motivation, and their relationships
with schoolmates.
Your discussion with parents can touch on the student’s home life, family dynamics, or family finances.
While they’re sometimes awkward, frank discussions are necessary for the teacher to understand the
student’s challenges and for the parents to learn how they can help their child reach their academic
potential. It may be valuable to include school staff — such as counselors — in your meeting with the
parents. While they’re a lot of work, the most effective parent-teacher conferences boost family
Parent-teacher conferences are not just for parents who have concerns about their child in the classroom.
These meetings provide an opportunity for parents to partner with teachers in providing a good learning
1. Prepare for the conference. Keep a folder with tests, papers, or any other topics you want to address.
Do your research by talking to your child; find out how she is doing in class, what’s happening at lunch
16
EDUCATIONAL ASSESSMENT AND EVALUATION
and recess.
2. Respect the teacher’s time. Be prompt, try not to bring other children and do not answer your cell
3. Begin with a positive attitude. Start with a compliment for the teacher before addressing concerns.
4. Work together. Purpose to partner with your teacher by being a team player. Approach issues
the teacher raises with the attitude of "We have to work together on this problem."
5. Remember that this is about your child, not you. The teacher is not trying to tell you how to parent.
6. Listen and be open minded. A conference is for teachers to give parents information based on
classroom observation. The teacher would not bring up an issue if it was not necessary.
7. Teachers are not the enemy. If you have a problem, no need to go on the attack. They care about your
child and want to help you and your child resolve the issue.
8. Ask questions. After you’ve listened, it’s your turn to ask questions. Make a list beforehand so you’ll
9. Learn the communication protocol. Ask the teacher her preferred method of communication: phone,
email, or notes. Let the teacher know you want to be an available parent.
10. Ask how you can be involved. Whether you work full-time, part-time or not at all, involvement is
about being aware of what's happening with your child's education, not just about volunteering in the
classroom.
Question No: 5 Write a note on advantages and disadvantages of criterion reference testing.
Answer:
CRITERION-REFERENCED TEST
A criterion-referenced test is designed to measure how well test takers have mastered a particular body of
knowledge. The term "criterion- referenced test" is not part of the everyday vocabulary in schools, and
yet, nearly all students take criterion-referenced tests on a routine basis. These tests generally have an
established "passing" score. Students know what the passing score is and an individual's test score is
achievement to a set of criteria or standards. This norm or criteria is established before candidates begin
the test.
Usually, schools or districts set the standard as a percentage. The test-taker’s score shows how far they’ve
progressed toward the approved standard. If they miss the mark, they must work harder.
A good example is measuring your body temperature. The accepted normal level is 98.6 degrees
Fahrenheit. If your temperature is too high in comparison, you are running a fever.
Criterion-referenced evaluations are used in schools to examine specific knowledge and abilities that
students have most likely gained. This determines how close they are to mastering a standard. They allow
teachers to assess how they can assist students improve in specific areas. Criterion-referenced
evaluations will show you where your learners are in terms of an accepted standard, allowing you to tailor
standardized tests used to measure how well an individual does relative to other people who have taken
the test are norm-referenced. Criterion-referenced assessment examples include driving tests, end-of-unit
These could be about the following: the number of children served, the number of children handled by
their respective language groups, or the usual level of schooling of parents. Expected replies are on a scale
of 1 to 5 on an observation form or survey, etc. This information can be scored and examined.
2. MULTIPLE-CHOICE QUESTIONS
In this type of criterion-referenced test, multiple choices follow a single question. There is only one
answer and the scores depend on the number of correct answers chosen.
In this format, a given sentence can either be true or false. The student might be asked to select the correct
statement or the false statement, or state whether the given statement is true or false.
4. OPEN-ENDED QUESTIONS
In this, the student may be asked to write a short answer or an essay or summarize a passage. It may also
Criterion-referenced tests are more suitable than norm-referenced tests for tracking the progress of
students within a curriculum. Test items can be designed to match specific program objectives. The scores
on a criterion referenced test indicate how well the individual can correctly answer questions on the
material being studied, while the scores on a norm-referenced test report how the student scored relative
Assessing student progress is something that every teacher must do. Criterion-referenced tests can be
19
EDUCATIONAL ASSESSMENT AND EVALUATION
developed at the classroom level. If the standards are not met, teachers can specifically diagnose the
deficiencies. Scores for an individual student are independent of how other students perform. In addition,
test results can be quickly obtained to give students effective feedback on their performance. Although
norm-referenced tests are most suitable for developing normative data across large groups, criterion-
Criterion-referenced assessments are needs based, meaning the tests are created with what the students’
needs are. If a student really needs to improve their knowledge of proper nouns, then a test will be created
on proper nouns.
When discussing the advantages of criterion referenced tests, it is also important to mention that since
students are only judged against themselves, they have a better chance of scoring high, which will help
improve their self-esteem as well. Studies show that students with special needs tend to have lower self-
esteem. Any way that we can help students feel better about themselves is a great opportunity.
One thing to remember is that each student is an individual and is different. By using criterion-referenced
assessments in your classroom, you can meet the individual needs of the students and differentiate your
assessments with the sole purpose of helping the students achieve to their fullest potential.
Criterion-referenced tests have some built-in disadvantages. Creating tests that are both valid and reliable
requires fairly extensive and expensive time and effort. In addition, results cannot be generalized beyond
the specific course or program. Such tests may also be compromised by students gaining access to test
questions prior to exams. Criterion-referenced tests are specific to a program and cannot be used to
Item analysis is used to measure the effectiveness of individual test items. The main purpose is to improve
tests, to identify questions that are too easy, too difficult or too susceptible to guessing. While test items
can be analyzed on both criterion-referenced and norm-referenced tests, the analysis is somewhat different
20
EDUCATIONAL ASSESSMENT AND EVALUATION
Items on norm-referenced tests need to discriminate between high and low performers because those tests
are generally used to make aptitude, proficiency or placement decisions. Criterion-referenced tests, in
contrast, are used to measure mastery of specific material and the goal is success for all students. The best
items on criterion-referenced tests are those that tap the important concepts.