Professional Documents
Culture Documents
Fed 313
Fed 313
Education
Education is concerned with the handing on of beliefs, moral standards, knowledge and skills. It is about change – change in the
learner’s intellectual capacity, ability to manipulate fine and gross muscles, changes in his attitudes, interests, values, beliefs,
interpersonal relationship skills, and character. The school is accountable to the society. The parents need feedback on the
educational attainments of their children Above all, an understanding of the pupils’ progress in school helps the teachers to
appraise their methods of teaching, effectiveness of instructional materials, and the extent the objectives of a particular course of
instruction has been achieved. In this paper the term assessment will be used synonymously with measurement.
Assessment
Assessment according to Shertzer and Linden (1979) refers to the procedures and processes employed tin collecting information
about or evidence of human behaviour. The measure of certain dimensions or characteristics of human beings is obtained through
assessment instruments such as educational and psychological tests and inventories.
Measurement
Chase (l978) defined measurement as the process of using numbers to describe quantity, quality or frequency according to a set
of rules. It involves the assigning of numbers to attributes or characteristics of persons, objects or events according to explicit
formulation or rules. The purpose of measurement is to collect quantitative information about the existence of a specified
attribute in a given object, person or event. In measurement, we ask the question: How much? In educational measurement, we
try to quantity the attributes of pupils according to specified rules. What is measured are the attributes of behaviour
characteristics or pupils, not the pupils themselves.
Evaluation
Evaluation is the process of making value judgements for the purpose of decision-making. It is simply a process through which
value judgement or decisions are made from a variety of observations or test results. It also involves the inspection of all
available information concerning the student, teacher and entire educational programme for the purpose of making valid
judgements about the degree of change in students and the effectiveness of the educational programme. In evaluation, you
make value judgements based on the quantitative information provided by the measurement instruments. You do this by
saying, for example, pass or fail, enough or not enough, satisfactory or unsatisfactory, etc.
The measurement instruments are classified according to the three domains of behaviour namely, cognitive, affective and
psychomotor.
(a) Instruments for Measurement Cognitive Behaviour - Examples include teacher-made achievement tests,
standardised achievement tests, intelligence tests, aptitude tests, and teacher ratings. [tmas, sat, it, at, tr.]
(b) Instruments for Measurement Affective Behaviour - Examples include attitude scales, interest inventories, personality tests
and sociometric tests. [as, ii, pt, st,]
(c)Instruments for Measurement Psychomotor Behaviour - Examples include performance tests, observational checklists and
rating scales.
Rating Scales
Rating scales are scales for rating each of the characteristics or activities one is seeking to observe or assess. They enable an
observer to systematically observe an individual and to record those observations. Rating scales provide a list of characteristics
(work products, events in a series), and these are to be judged on a scale that reflects the degree to which the desired qualities
or quantities were evident.
Checklists
Checklists usually contain lists of behaviour/characteristics that are either present or absent. It does not require the observer to
indicate the degree or extent to which a characteristic is present. Checklists are used to assess products, skills in use of tools
and materials. The observer inspects the product or process and notes the presence or absence of each item on the checklist.
Checklists are used primarily where we wish to observe whether an event is done or not done, an element is present or not
present.
Intelligence Tests
These are battery of tests used to determine a person’s level of intelligence. Intelligence tests are standardised tests designed
to assess a person’s functional and adaptive capabilities in various settings and situations. Intelligence tests throw more light
on an individual’s general mental ability to reason and capacity to learn. They are useful identifying students in need of special
attention in school, in diagnosing cognitive difficulties, and in helping people make optimal educational and vocational choices.
They measure the extent to which an individual's innate potential has been modified or developed within his or her
environment.
Aptitude Tests
Aptitude tests are those tests that measure an individual's potential to achieve in a given activity or to learn to achieve in that
activity (Gibson and Mitchell, 1979). They attempt to predict the degree of achievement that may be expected from individuals
in a particular activity. The purpose of aptitude testing is to predict how well an individual will perform on some criterion (such
as school grades, teacher’s ratings or job performances) before training or instruction is begun or selection or placement
decisions are made.
Personality Tests
Personality tests are instruments for measuring the affective or non-intellectual aspects of behaviour for personal counselling.
They are used to measure such aspects of personality as emotional stability, friendliness, motivation, dominance, interests,
attitude, leadership, self-concept, sociability, and introversion-extroversion Examples of personality tests include theMooney
Problem Checklist, Edwards Personal Preference Scale, Edwards Personality Inventory, Tennessee Self-Concept Scale, Students
Problem Inventory, Minnesota Multiphasic Personality Inventory (MMPI)and Projective tests
Attitude Scales
Attitude scales are self-report inventories designed to measure the extent to which an individual has favourable or
unfavourable feelings toward some persons, group, objects, institution or idea. They are used where an individual has little
reason for distorting the results. Examples include the social distance, Thurstone and Likert scales.
Interest Inventories
Interest inventories attempt to yield a measure of the types of activities that an individual has to like and choose. The interest
inventories are commonly used to measure student’s vocational or occupational preferences.The vocational interest
inventories comprise the activities and performance of people in different occupations. The individual is required to indicate his
interest in those activities. This is followed by the assignment of an empirically determined weight to the individual’s responses.
The performance of the individual on the inventory is then compared with those who have been successful in the occupations.
Achievement Tests
An achievement test is a test that measures the extent to which a person has "achieved" something, acquired certain
information, or mastered certain skills - usually as a result of planned instruction or training (Mehrens and Lehmann, 1978).
Achievement tests include all kinds of devices used primarily to assess how well the instructional objectives have been attained.
Achievement tests are designed to measure the degree of students learning in specific curriculum areas common to most
schools, such as mathematics, English usage, and reading. The test may be paper-pencil device administered to all children at
one time or set of oral questions administered individually.
1. End of course achievement tests that measure specifically what a student has learned in a particular subject.
2. General achievement tests that cover a student’s learning in a broad field of knowledge and can be given to students
who have taken quite different courses of study within a field.
3. Tests that measure the critical skills a student has learned and his or her ability to use these skills in solving new
problems.
Achievement tests can also be classified according to standardisation (teacher-made or standardised test) or reference
(norm versus criterion referenced test) or type of measure.
Two major forms of written classroom tests are available for general use. These are essay tests and objective tests. It is
therefore imperative that every teacher is proficient in constructing achievement tests in his subject area.
In constructing teacher-made tests, Sax (1980) outlined the following activities or procedure:
An instructional objective is a statement that describes in behavioural terms what the student should be able to do, the
conditions under which the task is to be performed, and the criterion for acceptable performance. It describes in behavioural
terms what the student should be able to do after completing a prescribed unit of instruction. To achieve this, we make use of
suitable action verbs to phrase the objectives. An instructional objective statement has the following elements: who, action,
product, conditions and standards for minimum acceptance.
Examples of Well-Stated Instructional Objectives
1. Demonstrate and play the under arm serve in volleyball with about 80% accuracy as judged by the teacher.
2. Solve accurately at least eight out of ten questions on set theory.
3. Recall without error, the formula for computing the rank difference correlation coefficient.
4. Enumerate without the aid of the notebook, at least six uses of the ocean basin.
In thirty minutes or less, the student can discuss at least three ways the baptism of John differed from that of Jesus Christ.
Test blueprint is a table which relates outcome to content and indicates the relative weight to be given to each of the various
areas. The purpose of the table is to provide assurance that the test will measure a representative sample of the learning
outcome and the subject-matter content to be measured.
Total No.
Of Items
Content Weight
Knowledge
Evaluation
Synthesis
Analysis
%
Total 100% 12 11 12 10 8 7 61
Note that in these calculations, the number of items have been rounded off to the nearest whole number while keeping to
the prescribed percentages.
Essay tests consist of a list of questions for which the subject (student) is required to write out the answer. An essay item is a
question or situation with instruction, which requires the testee to organise a complete thought in one or more written
sentences. The testee is given freedom to generate responses, which must be assessed by a scorer who is knowledgeable in the
subject area.
Essay questions are subdivided into two major types – restricted and extended response, depending on the amount of latitude
or freedom given the student to organise his ideas and write his answer. The amount of restriction in an essay question
depends on the educational level of the testee and the type of information required
Objective tests are tests in which every question is set in such a way as to have only one right answer. The opinion of the
examiner or marker does not come into play in judging whether an answer is good or bad, acceptable or unacceptable, right or
wrong. In other words, there is no subjective element involved. The items are constructed in a way as to have one,
predetermined correct answer.
Short-Answer Items
The short-answer item (also called the supply answer or completion item) presents a task in a sentence in which a word, a
number, a symbol, or a series of words has been omitted. The items call for only one response for a blank or a specific series of
responses for a series of blanks.
In the alternate choice item, the students are given two options to choose one. Such options include yes-no, true-false, right-
wrong, and correct-incorrect.
The matching item presents two lists usually called the premises and responses. The premises list consists of the questions or
problems to be answered, while the responses list contains the answers. Generally, the two lists have things in common; for
example, list of authors and books, inventions and inventors, historical events and dates, states and capitals, antonyms and
synonyms, words and opposite, etc. The students are directed to match each premise with the corresponding response.
The multiple-choice item consists of a stem and a branch. The stem presents the problem as either an incomplete statement or
a question, while the branch presents a list of suggested answers (responses or options). There are usually four or five options.
Among the options, only one is the correct answer (or the key). The incorrect options are called distracters. A distracter is a
plausible but wrong answer designed to confuse the student who does not know the correct answer. From the list of responses
provided, the student is required to select the correct one (or best).
A well-prepared examiner should be aware of the multiplicity of factors entering into test performance, the kind of
environmental conditions, and the special problems created by the current tendency to administer tests with separate answer
sheets. He should also have at hand a list of procedures to be followed in preparing for the test, during the test, and after the
test has been given.
The following are general preparations to make before administering a teacher-made achievement test.
1. The students should be informed as early as possible about the specific date of the test so that they can plan ahead in view
of their other commitments. A test that is announced early will provide opportunity for quality preparation and keep
students anxieties at acceptable levels.
2. The physical conditions should be as comfortable as possible and the students should be as relaxed as possible.
3. The examination venue must be well ventilated and free from distractions.
4. Provide sufficient writing desks and chairs for the students.
5. The sitting arrangement must be such that ensure free movement of invigilators. The space between rows should wide
enough to prevent copying through giraffing.
6. Provided enough question papers and answers scripts.
7. Provide for first aid in case of emergency.
8. Arrange for sufficient invigilators. The invigilators must be people of proven integrity. If possible, provide for male and
female supervisors. Both genders are needed in case a student may want to go the toilet or during bodily search while
entering the examination hall.
9. Arrange for security personnel to forestall any possible breach of peace by unruly students.
10. Provide a wall clock for time keeping
The factors to be taken into consideration in the marking of objective tests include the scoring formula to be employed, the
weighting of items and parts of test, the kinds of provision for responses, and the type of keys to be used. Objective tests are
very easy to mark. The most common practice is to mark only the correct answer (or key). Here each item on the test carries
equal marks, usually one point. An individual’s total score is the sum of the items marked right. Objective tests can be hand-
scored or machine-scored.
The effectiveness of an essay examination depends on how well it is graded. In grading the essay responses, one must use
appropriate methods to minimise biases, pay attention only to the significant and relevant aspects of the answer, be careful not
to let personal idiosyncrasies affect grading, and apply uniform standards to all the papers (Mehrens and Lehmann, 1978).
There are two major approaches to marking essay questions – the general impression and point-score methods.
As the name implies, this method relies mainly on the teacher’s impression of the extent a student’s answer fits the ideal
answer in his mind. It does not make use of written marking scheme. The general impression method is suitable for marking
long essays such as composition. The use of this method to mark essay questions is based on some assumptions. According to
Nwana (1982), this technique assumes that:
1. There is an accepted body of knowledge which put together will form the answer to the question.
2. The examiner had a good command of this body of knowledge and can identify it in the pupil’s answer.
3. The teacher’s impression of the answer is unaffected by his previous knowledge or lack of it, with particular reference
to the topic under question.
The general impression method has two procedures - impressionistic marking of one question at a time and sorting or
global technique.
In this procedure, the teacher or examiner reads the answer to a question, forms a general impression of how it fits an ideal
answer in his mind, and thereafter awards a mark out of the maximum assigned to that question. There is no written marking
scheme, rather the major points expected in the answer exists in the memory of the examiner.
In the sorting technique (sometimes called the global method), the examiner does not have to read every section or main ideas
before arriving at how many marks the student is to received in a question, rather he reads the entire questions and estimates
the overall quality of the answers. The answer scripts are placed or sorted into grade piles according to the varying qualities of
the responses and the levels of discrimination needed.
After the scripts are initially sorted, there is need to re-read those scripts in each pile to ensure homogeneity. As a final
check, a third reading is undertaken to ensure that the scripts in each pile are all at the same level. At this stage, a script may be
moved up or down as the case may be. The purpose is to sort in such a way as to maximise the differences between groups and
minimise the differences within each group.
After the marker is satisfied that the scripts are properly sorted (or categorised), he then proceeds to award the final mark
by assigning the same grade to every script in a pile. For example, the scripts in pile A are awarded an A grade, while those in
the second pile are awarded a B grade and so on.
In the point-score method (sometimes called the analytical method); the examiner writes out a detailed list of major points to
be covered by the answer. These points constitute the ideal or model answer. The ideal or model answer is broken down into
specific points. The student's score is based upon the number of points contained in his answer. In addition, the component
parts such as "effectiveness of expression," "logical organisation “and” support of statements' are specified and assigned points
or values. By so doing we end with a check list that can be used quite objectively.
This scoring method is used in marking essays in public examinations such as the General Certificate of Examination (G.C.E.)
and Senior School Certificate Examination (SSCE). The point score technique has three major stages – developing a marking
scheme, marking proper and totalling of points.
The marking scheme consists of the model answer, that is, the essential points to each question. As much as possible, the major
points relevant to the question should be included in the marking scheme. Also contained in the marking scheme is the relative
number of marks to be awarded to each point. The examiner should decide in advance on what marks (if any) to be award to
such factors as grammar, spelling, punctuation, neatness, presentation and handwriting. Allowance should also be made for
originality, that is, relevant points given by the candidates, which are not contained in the model answer.
Marking proper
At this stage, the teacher reads the responses to a particular question and gives points for those component parts contained in
the answer. He begins by marking one question at a time for all those that attempted that question using the marking scheme
as a guide. Thereafter, he marks the remaining question, one at a time. Marking one question at a time increases its objectivity
and enables the examiner to maintain a consistent mental set about the relevant points in the model answer to that question.
The examiner is expected to cover the student’s name before marking his script. In addition, the examiner should try not to
look at the scores of previously marked questions when evaluating the remaining questions
At this final stage, the examiner totals the points awarded to each answer script.
Your senior brother studying in the U.S.A. has just written home saying he would like to spend the next Christmas holidays at
home. Write him a reply briefly describing, among other things, the preparation being made to receive him, and two recent
events at home which have brought some changes to the family.
1. Content (5 marks)
2. Organisation (4 marks)
Clear setting of writer’s address, date; paragraphing, opening and closing of letter; addressing the envelope.
An informal letter to a brother studying overseas should be informative and conversational in tone (simple precise
language, correct choice of words and expressions).
History Question
What were the causes and consequences of the Fulani jihad on northern Nigeria?
(iii) 18th Century Jihad by Ibrahim Musa, Suleiman Bal, Uthman Dan Folio
Political, religious, economic (e.g. excessive taxation of the Fulanis); social (e.g. class struggle, oppressive
rule of Habe rulers.
3. Consequences (7 marks)
Obe (1980) outlined the following advantages and disadvantages of point score method of marking essay questions.
Advantages of the Point Score Method
1. The point score technique is based on a definite marking scheme and therefore defensible.
2. It improves the consistency of marking as the marking of one question at a time helps to sustain the mental set.
3. Boredom and fatigue can be minimised by having short marking sessions per question, with periods of relaxation in
between.
4. An external examiner easily moderates the marking.
5. It is generally more reliable.
The major way to ensure the reliability of essay marking is using marking scheme. However, Mehrens and Lehmann (1978)
offered the following suggestions:
1. Check your marking scheme against actual responses. Before marking, the teacher should select a few papers at random to
ascertain the appropriateness of the marking guide.
2. Be consistent in your grading. Ensure you are not influenced by the few papers you read and thereby mark either too
leniently or too harshly, depending on initial mindset. Occasionally refer to the first few papers marked to ensure that the
standards are being applied consistently.
3. Randomly shuffle the papers before grading them. It is generally assumed that the student’s essay grade may be influenced
by the position of his paper, especially if the preceding scripts were very good or very poor. Shuffling the papers at random
before marking minimises the effects of the preceding grades.
4. Mark only one question at a time for all papers.
5. Mark the scripts anonymously. To protect the student from teacher bias, teachers are advised to cover the names or names
of the students before marking their papers.
6. The mechanics of expression should be judged separately from what the student writes. The proportion of the question’s
points to be assigned to such factors as legibility, spelling, punctuation and grammar should be spelt out in the marking
guide and the students should be so informed.
7. Try to score all responses to a particular question without interruption.
8. If possible, have two independent readings of the script and use the average as the final score. A double reading by two
independent readers will make the scores more reliable.
9. Provide comments and correct errors.
UNIVERSITY OF LAGOS
Faculty: Education
Session: 2019/2020
Semester: First
COURSE DESCRIPTION
This course is designed to It is student teachers with skills to assess and report the extent of changes that has taken place in
students after a prescribed period of instruction. Similarly, students are exposed to different way to appraise the learners
characteristics such as attitudes, aptitudes, interests, values, social relations, personality and intelligence in other to gain better
understanding of the extent they possess these attributes. This course deals extensively on various devices to measure behaviour
changes of students in cognitive, affective and psychomotor domains.
LEARNING OUTCOMES
By the end of this course students should have achieved the following objectives:
1. Gain mastery of the keys concepts in measurement and evaluation such as of measurement, tests, evaluation, scales of
measurement and types of evaluation techniques in the classroom.
2. Get acquainted with various types of tests and other measurement instruments to assess behaviour changes in the cognitive,
affective and psychomotor domain.
3. Develop a test blueprint and use it to construct essay and objective tests to cover specified objectives..
4. Administer a test, score and undertake analysis of test items with a view to improving the overall quality of the test.
COURSE RESOURCES
Hopkins, C.D. & Antes (1985). Classroom Measurement and Evaluation. Illinois: P.E.Peacock Publishers, Inc.
Mehrens, W. A. & Lehmann, I. J. (1978). Measurement and Evaluation in Educational Psychology. New York:
Rinehart, Holt & Winston.
Nwana, O. C. (1982). Educational Measurement and Evaluation for Teachers.Lagos: Thomas Nelson (Nigeria) Ltd.
Obe, E. O. (1980). Educational Testing in West Africa. Lagos: Premier Press & Publishers.
Okoli. C. E. (2005). Introduction to Educational and Psychological Measurement. Lagos: Behenu Press and
Publishers.
Sax, G. (1980). Principles of Educational and Psychological Measurement and Evaluation. California: Wadsworth
Publishing Co.
COURSE CONTENT
Attendance
Students enrolled in each course are expected to make 65% attendance at Lectures in order to be eligible for examination.
Evaluation
Examination = 60 marks.
Total = 100marks
SCALES OF MEASUREMENT
There are four scales of measurement. These are nominal, ordinal, interval and ratio scales.
(a) Nominal scale: Measurement is on a nominal scale whenever numbers are used merely to describe or name rather than to
indicate the order or amount of something. The only measure of a characteristic available from the nominal scale is equivalence
(all items in a given category or with a given scale value are equal). Measurement on a nominal scale entails using numbers or
letters to represent, for example, states of origin, sex, occupation, numbers on football uniforms, etc. These numbers are used
to designate individuals or groups; no comparison can be made based on magnitude.
(b) Ordinal scale: This is a scale of rank or relative importance. It contains two degrees of measure: equivalence and relative
importance (greater than or less than). Measurement is on ordinal scale when numbers refer to ranks of objects or events on
some order of merit. For example, using numbers to designate the order of finishing a race, contest or positions after an
examination. There are no equal differences between the two ranks. Ordinal scale indicates greater than or less than, they
provide no information on how much difference exists between two ranks. For example, we cannot say with certainty that a
student who came second in a class test performed twice as much as the student who came fourth.
(c) Interval scale:A measurement is on an interval scale when equal differences can be interpreted as being equal in whatever
characteristics that are being measured. The interval scale has the properties of equivalence, relative importance and a unique
unit of measurement (equal differences). The Celsius scale of temperature is a good illustration of an equal-interval scale. For
example, we can say with certainty that the difference between 60 0 C and 400 Cis equal to the difference between 300 C and 100
C, both numerically and in terms of temperature (heat). The scores of intelligence tests and classroom achievements are
measured on the interval scale. One limitation of the interval scale is the lack of absolute zero point.
(d) Ratio scale: A ratio scale has all the properties of an interval scale plus an absolute or true zero. Example are scales on
which we measure time, length, and weight.
1. It serves as the tool for decision making especially to admit students into various institutions of learning.
2. The teacher uses measurement and evaluation to plan his teaching and to make decisions concerning the pace and
progress of teaching.
3. It provides information needed in the guidance and counselling of students.
4. It serves as tool to promote students to new classes after each academic year.
5. It is used for certification of students at the end of their training programme.
6. Measurement and evaluation provides the bases for the modification of an instructional curriculum.
7. It serves a means for reporting students’ progress to their parents and guardians.
There are four major evaluation techniques. These are placement, formative, diagnostic, and summative evaluation.
Placement Evaluation
This is used to appraise the students’ entering behaviour. The entering behaviour embraces everything that the student is
carrying with him as he waits for the lesson to commence. It is the learning a child has acquired that is relevant to the lesson or
topic to be taught. It is not necessarily the knowledge acquired in the previous lesson, but in some cases, the knowledge
acquired in the previous lesson may have relevance to the topic to be taught. The entry behaviour determines the readiness
level of the students. The teacher accomplishes the role of placement evaluation through the administration of pre-tests. It
could be a readiness or placement test. This enables the teacher to get the students psychologically and intellectually ready for
his teaching.
Formative Evaluation
The formative evaluation is used while the lesson is being taught to monitor the learning progress. The formative evaluation
helps to form the students’ new behaviour. This is accomplished by supplying of feedback to the students. To be effective, the
feedback should be informative and rewarding. The informative feedback tells the student the extent he is responding correctly
or incorrectly and identifies the specific learning errors that needs correction. The rewarding feedback serves as reinforcement
to successful learning. This could be in form of praises, smiles, clapping of hands, etc. Formative evaluation is not for assigning
of grades. The teacher makes use of oral questions, short written exercises and other activities.
Diagnostic Evaluation
The teacher uses the diagnostic evaluation to determine the causes and areas of a student’s learning difficulties and to
formulate a plan for remedial action. When a student is found to record persistence and recurring difficulties that are left
unresolved by the corrective prescription of formative evaluation, diagnostic tests are administered to ascertain the specific
areas of weakness in order to improve his performance.
Summative Evaluation
This is a type of evaluation done at the end of a course to assign grades or to measure the success of the instruction. How much
a student has learned is determined by subtracting his entry behaviour from the terminal behaviour. Summative evaluation is
used to ascertain the extent to which instructional objectives have been achieved. It is used primarily for assigning of course
grades and for certifying pupil mastery of the intended learning outcomes. The teacher-made tests and observational
techniques are used.
ITEM ANALYSIS
Definition
Item analysis is the process of examining the students’ responses to each test item in order to judge the quality of
the items. It is a statistical technique of reviewing every item on the test with a view to refining the whole test. The
technique helps us to not only to identify poor items, but also to decide why an item is not functioning as one had
planned. Items can be analysed qualitatively, in terms of their content and form, and quantitatively, in terms of their
statistical properties. Qualitative analysis includes the consideration of content validity, and the evaluation of items in
terms of effective item-writing procedures. Quantitative analysis includes principally the measurement of item
difficulty and item discrimination. Item analysis helps us to answer such questions as the following for each item:
1. How hard is the item?
3. Do all the options attract responses, or are there some that are so unattractive that they might as well not
be included?
Three things are considered in item analysis, namely: item difficulty, discriminating ability of the item and item
choice analysis.
Item Difficulty
The item difficulty pertains to the easiness of the item. A good item should be neither too easy nor too difficult. The
difficulty index of an item is the proportion of the testees who got the item right. The difficulty index ranges from
zero to one (or from zero to 100%). Items whose difficulty index ranges from 20 to 80 percent are acceptable. The
best difficulty index is 0.5. An item should not be too difficult that almost all the testees missed it or too easy that
every testee got it right. The formula for item difficulty (P) is:
Where:
U = number of students in the upper group who got the item right.
L = number of students in the lower group who got the item right.
The discriminating power of an item is the extent to which each item distinguishes between those who scored low or
high in the test. It measures how well a test item contributes to separating the upper and lower group. Item
discrimination tells us if an item is showing the differences between capable and less capable students. A good
discriminating item is one in which a greater number of students who scored highly gets right and few of the
students who scored very low gets right. The discriminating index (D) can take values ranging from –1.00 to +1.00.
The higher the D values the better the item discrimination. Any item that has a D value of +.40 and above is
considered very effective. However, D values that range between +.20 and +.39 are considered satisfactory. Any
item that has negative value should be discarded. The formula for the item discrimination power is:
Where:
U = number of students in the upper group who got the item right.
L = number of students in the lower group who got the item right.
This is done to determine the effectiveness of the distracters. After identifying the poor test items, such as items that
are too easy, too difficult, or those with zero and negative discrimination, there is need to ascertain what is wrong
with these items. Analysing the effectiveness of distracters entails a comparison of the responses of students in
upper and lower groups. In doing this, the following points should kept in mind:
1. Each distracter should be selected by about equal number of the lower group.
2. Substantially more students in the upper than lower group should respond to the correct alternative.
3. Substantially more students in the lower group should respond to the distracter. If a distracter confuses only
the upper group, it is likely faulty. If it distracts both groups, replace it. This is could be because of vague or
1. Administer the test, score the items and arrange the students’ scores in order of merit (highest to lowest).
3. Beginning with item number one, count how many students in the upper group (U) that got it right. Thereafter,
count how many students in the lower group (L) that got the item right.
7. Identify the poor items and analyse their item choices or the effectiveness of the distracters.
Example One
Items 1,2,3,4,5……………..48, 49 and 50 on a Mathematics achievement test were passed by the following number of
"upper" and "lower" students:
Item no 1 2 3 4 5 ….. 4 49 50
8
20 "Upper" Students 18 1 5 20 14 8 12 10
4
20 "Lower" Students 16 9 2 20 11 1 10 5
6
Obtain the item difficulty and item discriminating power per item and comment.
Table 1: Solution to Example One
No 20 20
1 18 16 Too easy
(Reject)
2 14 9 Good
(Accept)
3 5 2 Too difficult
(Reject)
4 20 20 Too easy
(Reject)
5 14 11 Good
(Accept)
48 8 16 Negative
Discrimination
Reject
49 12 10 Good
(Accept)
50 10 5 Fair
(Accept)
Choice of Options
A B C D E Total
Upper Group (20) 4 6 1 8 1 20
Based on the calculations Table 1, items 2, 5, 49 and 50 are satisfactory because their difficulty level and
discriminating power are within acceptable ranges. Items number 3 should be rejected because it is too difficult.
Similarly, items 1 and 4 are too easy and should be rejected. Even though item 48 have good difficulty level, the
discriminating index is negative. It should therefore be discarded. The examiner should undertake item choice
analysis to determine why items 4 and 48 recorded zero and negative discriminating power respectively. It is either
the questions were vague or the examiner may have used a wrong key in marking the items.
Revision Exercise
1. Items 1, 2, 3, 4, 5, …..46, 47, 48, 49, 50 on a Mathematics achievement test were respectively passed by
Item No 1 2 3 4 5 ….. 46 47 48 49 50
Obtain the item difficulty and item discriminating power per item and comment.
(U) (L)
2
3
48
49
50
References
Anastasi, A. (1982). Psychological Testing. New York: Macmillan Publishing Co., Inc.
Obe, E. O. (1980). Educational Testing in West Africa. Lagos: Premier Press & Publishers.
Okoli, C.E. (2005). Introduction to Educational and Psychological Measurement. Lagos: Behenu Press and Publishers.
Distractors that are not chosen by any examinees should be replaced or eliminated. They are not contributing to the test's ability
Statistics refer to mathematical techniques used in gathering, organising, analysing and interpreting of numerical
data. The raw test data are meaningless unless they are statistically treated. The statistical methods could be
descriptive or inferential.
Types of Statistics
1. Descriptive Statistics
Descriptive statistics comprise those methods concerned with collecting and describing a set of data to yield
meaningful information. It provides information only about the collected data and in no way draws inferences or
conclusions concerning a larger set of data. The descriptive statistics is concerned with the numerical description of a
particular group. No conclusions are extended beyond the group described. The data describe one group and that
one group only. If a teacher is only interested in a description of the performance of a specified class on a particular
test and not in further generalisation, he or she is dealing with a problem of descriptive statistics. Presentation of data
in the form of graphs and tables also fall under the heading of descriptive statistics. The descriptive statistics has
three measures. These are:
2. Inferential Statistics
Inferential statistics comprise those methods concerned with the analysis of a subset of data leading to predictions or
inferences about the entire set of data. In inferential statistics, the teacher selects a sample that is representative of
the population and uses the information obtained from the sample to make inferences about the parameters of the
population. Examples of inferential statistics include the student’s t test and F test.
In this course material, we shall consider the following methods of descriptive statistics, namely
69 74 67 74 66 70 68 71 75 80
80 72 69 75 68 76 74 74 73 73
73 76 72 72 72 65 64 75 65 70
71 73 71 77 64 66 72 70 78 77
72 71 71 72 78 68 73 76 67 70
Guidelines
84 82 70 72 80 62 94 86 68 68
77 89 85 86 46 48 84 88 89 78
86 57 81 70 55 88 79 69 52 61
68 50 77 90 77 78 89 81 67 91
58 73 77 80 78 76 76 83 72 78
Guidelines
1. Obtain the range of the scores by subtracting the lowest score (L) from highest score (H) in the array.
2. Select the desired class width preferably an odd number.
3. Determine the number of class intervals by dividing the range by the desired class width.
4. The class interval containing the highest score values should be placed on top and vice versa.
With the aid of a protractor and a semi circle found in the mathematical set, one can carefully draw a pie chart. The
bar and pie charts for the data in Table 9-3 are presented below.
Bar Chart
140
120
100
Enrolm ent Rate
80
60
40
20
0
Edu. Adm Adult e duc G &C B us . Edu C he m Edu
Courses
Pie Chart
Chem Edu, 30
(i) Mean
The mean is the sum of all the scores in the data set divided by the number of scores. The mean is denoted by
(pronounced X bar). The formula for computing the mean of ungrouped data is given below.
Where:
Example: Find the mean of 10, 12, 8, 11, 12, 13, 9, and 7
= X1 + X2 + X3 + X4 + X5 …….X8
= 10 + 12 + 8 + 11 + 12 + 13 + 9 + 7
= 82
= 10.25
1. The average takes account of all the score values in a set of data. It reflects the value of extremely low scores in
the data.
2. When further calculations are required, the mean is the average that will lend itself to further mathematical
manipulations.
(ii) Median
The median is the middle observation of a data set when ordered from the smallest to the biggest. When a set of
data is arranged in order of magnitude, the median is the score that cuts the data into two. If there are two
observations in the middle (this happens when the scores are even-numbered), the median is the average of the two.
Solution: 12 14 16 18 20 22 (Note that the number of scores are even). The median is the average of
the two scores in the middle.
1. When the purpose is to find an average that is not affected by the extreme figures in the population.
2. If the curve is positively skewed, the median is preferred.
(iii) Mode
The mode of a set of scores is the score that has the highest frequency. In a grouped data, the mode is the mid point
of the class interval (modal class) that has the highest frequency.
The answer is 3 because it is the commonest observation or the score with the highest frequency.
D. Measures Of Variability
The measure of variability for a set of scores (also called measure of spread, scatter or dispersion) tells us the extent;
each of the scores deviates from each other. There are three common measures of variability. These are range,
quartiles and standard deviation.
(i) Range
The range is the difference between the highest (H) and lowest (L) score in an array.
Range =H - L
= 16 - 4
= 12
Just like the mode, the range provides us with a quick rough check of the spread or scatter of the scores. However,
its information only concerns the highest and lowest scores ignoring the other scores in the group.
(ii) Quartiles
Quartiles are points that divide the frequency into equal fourths. A quartile is a point on a distribution below, which is
25%, 50%, or 75% of the scores. Suppose a data set is ordered from the least to the highest, the values that divide
the data into four equal sets are called quartiles. Q 1, Q2 and Q3 denote the quartiles. The Q1 or P25 is the lower
quartile, which cuts off 25% of the data from below. The Q 2 or P50 (median) is the second quartile, which cuts off 50%
of the data from below. The Q3 or P75 is the upper quartile, which cuts off 75% of the data from below.
The score distance between the third and first quartiles (Q3 and Q1) is called the interquartile range. The semi-
interquartile (Q) range is half of the interquartile range. As a measure of variability, semi-interquartile range is the
average distance from the median to the second quartile, i.e., it tells how far the quartile points lie from the median,
on the average. The semi-interquartile range is
The semi-interquartile gives us the information about the variability within the middle 50%. In a situation where the
median is the measure of central tendency, the appropriate measure of variability is the semi-interquartile range.
Example: Find the 1st, 2nd and 3rd quartiles of the following scores:
5 6 12 10 20 2 3 9 18 22 5 13
Before computing the quartile values, first arrange the scores in order of magnitude as shown below:
2 3 5 5 6 9 10 12 13 18 20 22.
E. Measures Of Relationship
Correlation
Correlation is the relationship between two or more paired variables, that is, two or more sets of data. A teacher may
be interested in finding whether a relationship exists between students' achievement in say Mathematics and Physics
or between Chemistry achievement in mock and Senior School Certificate Examination (SSCE). The relationship
could be negative, zero or positive. A negative relationship means that an increase in variable x is likely to lead to
decrease in variable y and vice versa. On the other hand, a positive relationship implies that an increase in the value
of variable x is likely to lead to increase in variable y and vice versa.
The degree of relationship is represented by the correlation coefficient. This coefficient is denoted by either the Greek
letter rho or the symbol r, depending upon certain assumptions about the distribution and the method used to
compute the coefficient. The resulting coefficient can range from –1.0 to +1.0, where 0 indicates no correlation
between the data sets. A correlation of +1.0 or –1.0 means that you can always predict one data set from the values
in another. The correlation coefficient can be computed using either the Pearson’s product moment or the
Spearman’s rank difference method.
Standard Scores
Raw scores are frequently transformed to other scales to facilitate analysis and interpretation. One such example is
the standard score. The standard score expresses a person’s performance in terms of its deviation from the mean in
standard deviation units. Standard scores have several advantages.
a) First, they measure on an interval scale. By expressing performance in terms of standard deviation units, we
have transformed raw scores to a scale with equal-sized units.
b) Second, the use of standard scores allows us to compare scores from several tests directly, even those
having different means and/or standard deviations.
In this course material, we shall consider three types of standard scores and scales: sigma (z) score, standard Z or T
score, and stanine scale.
Where:
X = raw score
M = mean
S = standard deviation
Example:
A teacher wished to get a student’s equally weighted average on a Mathematics and a Geography test . Using the
table below, determine the subject in which the student performed better.
Mathematics 45 48 60 5
Geography 62 55 86 6
Mathematics z-score
Geography z-score
The score of 45 in mathematics may be expressed as z-score of –0.6, indicating that 45 is –0.6 standard deviations
below the mean. The score of 62 may be expressed as a sigma score of +1.16, indicating that 62 is +1.16 standard
deviation above the mean. Hence, the student performed better in geography.
Mathematics T = 10 (-0.6) + 50 = -6 + 50 = – 6 + 50 = 44
The stanine (also called standard nine) is a method of reporting achievement test grades. Instead of using letters as
identification marks for the grades, it uses the numbers 1, 2, 3, 4, 5, 6, 7, 8, and 9. The stanine is consequently a
nine-point scoring scale using the numbers 9 down to 1. The average score in any test normally falls into stanine 5.
The stanine is recommended only where the range of marks or groups of marks in the distribution is wide enough,
preferably twenty or more, and the number of pupils tested large. The only limitation in the use of stanine scores for
computation is the situation where there are ties in the raw scores. Too many ties make the theoretical application of
the formula difficult. The following is the percentage of scores for each stanine:
First 4% = stanine 9
Next 7% = stanine 8
Next 7% = stanine 2
Last 4% = stanine 1