Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 38

FED 313: COURSE MATERIAL

Definition Of Basic Terms

Education

Education is concerned with the handing on of beliefs, moral standards, knowledge and skills. It is about change – change in the
learner’s intellectual capacity, ability to manipulate fine and gross muscles, changes in his attitudes, interests, values, beliefs,
interpersonal relationship skills, and character. The school is accountable to the society. The parents need feedback on the
educational attainments of their children Above all, an understanding of the pupils’ progress in school helps the teachers to
appraise their methods of teaching, effectiveness of instructional materials, and the extent the objectives of a particular course of
instruction has been achieved. In this paper the term assessment will be used synonymously with measurement.

Assessment

Assessment according to Shertzer and Linden (1979) refers to the procedures and processes employed tin collecting information
about or evidence of human behaviour. The measure of certain dimensions or characteristics of human beings is obtained through
assessment instruments such as educational and psychological tests and inventories.

Measurement

Chase (l978) defined measurement as the process of using numbers to describe quantity, quality or frequency according to a set
of rules. It involves the assigning of numbers to attributes or characteristics of persons, objects or events according to explicit
formulation or rules. The purpose of measurement is to collect quantitative information about the existence of a specified
attribute in a given object, person or event. In measurement, we ask the question: How much? In educational measurement, we
try to quantity the attributes of pupils according to specified rules. What is measured are the attributes of behaviour
characteristics or pupils, not the pupils themselves.

Evaluation

Evaluation is the process of making value judgements for the purpose of decision-making. It is simply a process through which
value judgement or decisions are made from a variety of observations or test results. It also involves the inspection of all
available information concerning the student, teacher and entire educational programme for the purpose of making valid
judgements about the degree of change in students and the effectiveness of the educational programme. In evaluation, you
make value judgements based on the quantitative information provided by the measurement instruments. You do this by
saying, for example, pass or fail, enough or not enough, satisfactory or unsatisfactory, etc.

Types of of Measurement Instruments

The measurement instruments are classified according to the three domains of behaviour namely, cognitive, affective and
psychomotor.

(a) Instruments for Measurement Cognitive Behaviour - Examples include teacher-made achievement tests,
standardised achievement tests, intelligence tests, aptitude tests, and teacher ratings. [tmas, sat, it, at, tr.]
(b) Instruments for Measurement Affective Behaviour - Examples include attitude scales, interest inventories, personality tests
and sociometric tests. [as, ii, pt, st,]

(c)Instruments for Measurement Psychomotor Behaviour - Examples include performance tests, observational checklists and
rating scales.

Rating Scales

Rating scales are scales for rating each of the characteristics or activities one is seeking to observe or assess. They enable an
observer to systematically observe an individual and to record those observations. Rating scales provide a list of characteristics
(work products, events in a series), and these are to be judged on a scale that reflects the degree to which the desired qualities
or quantities were evident.

Checklists

Checklists usually contain lists of behaviour/characteristics that are either present or absent. It does not require the observer to
indicate the degree or extent to which a characteristic is present. Checklists are used to assess products, skills in use of tools
and materials. The observer inspects the product or process and notes the presence or absence of each item on the checklist.
Checklists are used primarily where we wish to observe whether an event is done or not done, an element is present or not
present.

Intelligence Tests

These are battery of tests used to determine a person’s level of intelligence. Intelligence tests are standardised tests designed
to assess a person’s functional and adaptive capabilities in various settings and situations. Intelligence tests throw more light
on an individual’s general mental ability to reason and capacity to learn. They are useful identifying students in need of special
attention in school, in diagnosing cognitive difficulties, and in helping people make optimal educational and vocational choices.
They measure the extent to which an individual's innate potential has been modified or developed within his or her
environment.

Aptitude Tests

Aptitude tests are those tests that measure an individual's potential to achieve in a given activity or to learn to achieve in that
activity (Gibson and Mitchell, 1979). They attempt to predict the degree of achievement that may be expected from individuals
in a particular activity. The purpose of aptitude testing is to predict how well an individual will perform on some criterion (such
as school grades, teacher’s ratings or job performances) before training or instruction is begun or selection or placement
decisions are made.

Personality Tests

Personality tests are instruments for measuring the affective or non-intellectual aspects of behaviour for personal counselling.
They are used to measure such aspects of personality as emotional stability, friendliness, motivation, dominance, interests,
attitude, leadership, self-concept, sociability, and introversion-extroversion Examples of personality tests include theMooney
Problem Checklist, Edwards Personal Preference Scale, Edwards Personality Inventory, Tennessee Self-Concept Scale, Students
Problem Inventory, Minnesota Multiphasic Personality Inventory (MMPI)and Projective tests

Attitude Scales

Attitude scales are self-report inventories designed to measure the extent to which an individual has favourable or
unfavourable feelings toward some persons, group, objects, institution or idea. They are used where an individual has little
reason for distorting the results. Examples include the social distance, Thurstone and Likert scales.

Interest Inventories

Interest inventories attempt to yield a measure of the types of activities that an individual has to like and choose. The interest
inventories are commonly used to measure student’s vocational or occupational preferences.The vocational interest
inventories comprise the activities and performance of people in different occupations. The individual is required to indicate his
interest in those activities. This is followed by the assignment of an empirically determined weight to the individual’s responses.
The performance of the individual on the inventory is then compared with those who have been successful in the occupations.

Achievement Tests

An achievement test is a test that measures the extent to which a person has "achieved" something, acquired certain
information, or mastered certain skills - usually as a result of planned instruction or training (Mehrens and Lehmann, 1978).
Achievement tests include all kinds of devices used primarily to assess how well the instructional objectives have been attained.
Achievement tests are designed to measure the degree of students learning in specific curriculum areas common to most
schools, such as mathematics, English usage, and reading. The test may be paper-pencil device administered to all children at
one time or set of oral questions administered individually.

Types of Achievement Tests


The Educational Testing Service (Shertzer and Linden, 1979) classified achievement tests into three types:

1. End of course achievement tests that measure specifically what a student has learned in a particular subject.
2. General achievement tests that cover a student’s learning in a broad field of knowledge and can be given to students
who have taken quite different courses of study within a field.
3. Tests that measure the critical skills a student has learned and his or her ability to use these skills in solving new
problems.
Achievement tests can also be classified according to standardisation (teacher-made or standardised test) or reference
(norm versus criterion referenced test) or type of measure.

Basic Steps In Constructing Teacher-Made Achievement Tests

Two major forms of written classroom tests are available for general use. These are essay tests and objective tests. It is
therefore imperative that every teacher is proficient in constructing achievement tests in his subject area.

In constructing teacher-made tests, Sax (1980) outlined the following activities or procedure:

1. Determine the reason for testing (placement, diagnosis, feedback, etc.).


2. Determine and write the objectives to be met by the test.
3. Determine the best type of item that will meet the objectives of the test.
4. Develop a test blueprint or table of specifications.
5. Determine the answer key, scoring criteria, and directions for administration of the test (time limits, answer sheet
format, etc.).
6. Determine the sequencing of item presentation.
7. Determine the best way to present items (oral presentation, writing the items on the black board, print on paper, etc.).
8. Administer the test.
9. Score the test according to the answer key and scoring criteria.
10. Analyse the items to determine item ambiguity, miskeying, difficulty level, discrimination index, etc.).
11. Analyse the test as a whole (determine reliability and validity of the test).
12. Evaluate student progress in relation to the objectives of testing.

Stating Instructional Objectives

An instructional objective is a statement that describes in behavioural terms what the student should be able to do, the
conditions under which the task is to be performed, and the criterion for acceptable performance. It describes in behavioural
terms what the student should be able to do after completing a prescribed unit of instruction. To achieve this, we make use of
suitable action verbs to phrase the objectives. An instructional objective statement has the following elements: who, action,
product, conditions and standards for minimum acceptance.
Examples of Well-Stated Instructional Objectives

At the end of the lesson, the students should be able to:

1. Demonstrate and play the under arm serve in volleyball with about 80% accuracy as judged by the teacher.
2. Solve accurately at least eight out of ten questions on set theory.
3. Recall without error, the formula for computing the rank difference correlation coefficient.
4. Enumerate without the aid of the notebook, at least six uses of the ocean basin.
In thirty minutes or less, the student can discuss at least three ways the baptism of John differed from that of Jesus Christ.

Developing a test blueprint

Test blueprint is a table which relates outcome to content and indicates the relative weight to be given to each of the various
areas. The purpose of the table is to provide assurance that the test will measure a representative sample of the learning
outcome and the subject-matter content to be measured.

The preparation of a table of specification includes the following steps:

1. Identify the learning outcomes and content to be measured by the test.


2. Weight the learning outcome and content areas in terms of the relative importance.
3. Build the table in accordance with these relative weights by distributing the test items proportionately among the relevant
cells of the table.
The numbers in each cell of the table indicate the number of test items to be devoted to that area. The number of
items assigned to each cell is determined by the weight given to each outcome and each subject-matter area. The blueprint
specifies precisely what weight to give each topic and each instructional objective. . In assigning relative weights to each
learning outcome and each content area, a number of factors will enter into determination:

 How important is each area in the total learning experience?


 How much time was devoted to each area during instruction?
 Which outcome has the greatest retention and transfer value?
 What relative importance do curriculum specialists assign to each area?
In the final analysis, however, the weight assigned in the table should faithfully reflect the emphasis given during
instruction.
Table 1: Test Blueprint for an Objective test in Chemistry

Total No.
Of Items
Content Weight

Knowledge

Evaluation
Synthesis
Analysis
%

Gaseous state and laws 10% 2 1 1 1 1 - 6

Acids, bases and salts 20% 2 2 4 2 1 1 12.2

Electrolysis and redox reactions 15% 2 2 2 1 1 1 9.15

Rates of chemical reactions 30% 2 3 3 4 3 3 18.3

Energy effects 25% 4 3 2 2 2 2 15.25

Total 100% 12 11 12 10 8 7 61

Note that in these calculations, the number of items have been rounded off to the nearest whole number while keeping to
the prescribed percentages.

Definition of Essay Tests

Essay tests consist of a list of questions for which the subject (student) is required to write out the answer. An essay item is a
question or situation with instruction, which requires the testee to organise a complete thought in one or more written
sentences. The testee is given freedom to generate responses, which must be assessed by a scorer who is knowledgeable in the
subject area.

Essay questions are subdivided into two major types – restricted and extended response, depending on the amount of latitude
or freedom given the student to organise his ideas and write his answer. The amount of restriction in an essay question
depends on the educational level of the testee and the type of information required

Definition of Objective Tests

Objective tests are tests in which every question is set in such a way as to have only one right answer. The opinion of the
examiner or marker does not come into play in judging whether an answer is good or bad, acceptable or unacceptable, right or
wrong. In other words, there is no subjective element involved. The items are constructed in a way as to have one,
predetermined correct answer.

Short-Answer Items

The short-answer item (also called the supply answer or completion item) presents a task in a sentence in which a word, a
number, a symbol, or a series of words has been omitted. The items call for only one response for a blank or a specific series of
responses for a series of blanks.

The Alternate Choice Items

In the alternate choice item, the students are given two options to choose one. Such options include yes-no, true-false, right-
wrong, and correct-incorrect.

The Matching Items

The matching item presents two lists usually called the premises and responses. The premises list consists of the questions or
problems to be answered, while the responses list contains the answers. Generally, the two lists have things in common; for
example, list of authors and books, inventions and inventors, historical events and dates, states and capitals, antonyms and
synonyms, words and opposite, etc. The students are directed to match each premise with the corresponding response.

The Multiple-choice Items

The multiple-choice item consists of a stem and a branch. The stem presents the problem as either an incomplete statement or
a question, while the branch presents a list of suggested answers (responses or options). There are usually four or five options.
Among the options, only one is the correct answer (or the key). The incorrect options are called distracters. A distracter is a
plausible but wrong answer designed to confuse the student who does not know the correct answer. From the list of responses
provided, the student is required to select the correct one (or best).

PROCEDURE FOR ADMINISTERING TESTS

What Is Test Administration?


Test administration refers to the process and procedures employed in the task of giving a test (Olusakin and Ubangha, 1997). It
applies not only to the act of handing out and collecting test materials but also to the more general matters associated with it
such as preparation of examiners, testing condition, examinees physical and psychological condition, examiner-examinee
rapport, and every other details of the testing act. To ensure objective and realistic assessment of students learning, two things
are paramount. First, the preparations must be such that would guarantee students comfort and hitch-free examination
conduct. Secondly, the prescribed rules and regulations governing examination conduct must be adhered to. By so doing, no
candidate gains an undue advantage over others and gets a grade that does not represent his or her level of attainment (Okoli,
1997).

A well-prepared examiner should be aware of the multiplicity of factors entering into test performance, the kind of
environmental conditions, and the special problems created by the current tendency to administer tests with separate answer
sheets. He should also have at hand a list of procedures to be followed in preparing for the test, during the test, and after the
test has been given.

General Preparations before Administering a Teacher-Made Achievement Test

The following are general preparations to make before administering a teacher-made achievement test.

1. The students should be informed as early as possible about the specific date of the test so that they can plan ahead in view
of their other commitments. A test that is announced early will provide opportunity for quality preparation and keep
students anxieties at acceptable levels.
2. The physical conditions should be as comfortable as possible and the students should be as relaxed as possible.
3. The examination venue must be well ventilated and free from distractions.
4. Provide sufficient writing desks and chairs for the students.
5. The sitting arrangement must be such that ensure free movement of invigilators. The space between rows should wide
enough to prevent copying through giraffing.
6. Provided enough question papers and answers scripts.
7. Provide for first aid in case of emergency.
8. Arrange for sufficient invigilators. The invigilators must be people of proven integrity. If possible, provide for male and
female supervisors. Both genders are needed in case a student may want to go the toilet or during bodily search while
entering the examination hall.
9. Arrange for security personnel to forestall any possible breach of peace by unruly students.
10. Provide a wall clock for time keeping

General Guidelines for Actual Test Conduct


1. The students should be motivated to do their best. They should be convinced in their minds that the results will be of value
to them and that it will help identify their strengths and weaknesses.
2. Before the test begins, the teacher should reassure his students, put them at ease, and answer any questions they may
have. Once the test has begun, the teacher must ensure that distractions are kept to a minimum.
3. When administering the test, the teacher should make sure that the students understand the directions.
4. Read out the instructions to the hearing of all the students. Uniform instruction should be given to the students.
5. The teacher should keep the students informed of the time remaining (for example, writing time on the blackboard at, say
every thirty minutes).
6. As much as possible, minimise the practice of announcing time at frequent intervals as this heightens anxiety in students.
Moreover, time management is part of examination.
7. In the course of supervision, the teacher should not stand or hover too long over a student. This practice may make a
student to become nervous and to lose concentration.
8. There should be proper identification of candidates at the gate.
9. The invigilators and supervisors should be encouraged to maintain close supervision of test environment.

MARKING OBJECTIVE AND ESSAY TESTS

Marking Objective Tests

The factors to be taken into consideration in the marking of objective tests include the scoring formula to be employed, the
weighting of items and parts of test, the kinds of provision for responses, and the type of keys to be used. Objective tests are
very easy to mark. The most common practice is to mark only the correct answer (or key). Here each item on the test carries
equal marks, usually one point. An individual’s total score is the sum of the items marked right. Objective tests can be hand-
scored or machine-scored.

Marking Essay Questions

The effectiveness of an essay examination depends on how well it is graded. In grading the essay responses, one must use
appropriate methods to minimise biases, pay attention only to the significant and relevant aspects of the answer, be careful not
to let personal idiosyncrasies affect grading, and apply uniform standards to all the papers (Mehrens and Lehmann, 1978).
There are two major approaches to marking essay questions – the general impression and point-score methods.

General Impression Method

As the name implies, this method relies mainly on the teacher’s impression of the extent a student’s answer fits the ideal
answer in his mind. It does not make use of written marking scheme. The general impression method is suitable for marking
long essays such as composition. The use of this method to mark essay questions is based on some assumptions. According to
Nwana (1982), this technique assumes that:

1. There is an accepted body of knowledge which put together will form the answer to the question.
2. The examiner had a good command of this body of knowledge and can identify it in the pupil’s answer.
3. The teacher’s impression of the answer is unaffected by his previous knowledge or lack of it, with particular reference
to the topic under question.
The general impression method has two procedures - impressionistic marking of one question at a time and sorting or
global technique.

Impressionistic Marking of One Question at a Time

In this procedure, the teacher or examiner reads the answer to a question, forms a general impression of how it fits an ideal
answer in his mind, and thereafter awards a mark out of the maximum assigned to that question. There is no written marking
scheme, rather the major points expected in the answer exists in the memory of the examiner.

Impressionistic Marking through the Sorting Technique

In the sorting technique (sometimes called the global method), the examiner does not have to read every section or main ideas
before arriving at how many marks the student is to received in a question, rather he reads the entire questions and estimates
the overall quality of the answers. The answer scripts are placed or sorted into grade piles according to the varying qualities of
the responses and the levels of discrimination needed.

After the scripts are initially sorted, there is need to re-read those scripts in each pile to ensure homogeneity. As a final
check, a third reading is undertaken to ensure that the scripts in each pile are all at the same level. At this stage, a script may be
moved up or down as the case may be. The purpose is to sort in such a way as to maximise the differences between groups and
minimise the differences within each group.

After the marker is satisfied that the scripts are properly sorted (or categorised), he then proceeds to award the final mark
by assigning the same grade to every script in a pile. For example, the scripts in pile A are awarded an A grade, while those in
the second pile are awarded a B grade and so on.

Point Score Method

In the point-score method (sometimes called the analytical method); the examiner writes out a detailed list of major points to
be covered by the answer. These points constitute the ideal or model answer. The ideal or model answer is broken down into
specific points. The student's score is based upon the number of points contained in his answer. In addition, the component
parts such as "effectiveness of expression," "logical organisation “and” support of statements' are specified and assigned points
or values. By so doing we end with a check list that can be used quite objectively.

This scoring method is used in marking essays in public examinations such as the General Certificate of Examination (G.C.E.)
and Senior School Certificate Examination (SSCE). The point score technique has three major stages – developing a marking
scheme, marking proper and totalling of points.

Developing a marking scheme

The marking scheme consists of the model answer, that is, the essential points to each question. As much as possible, the major
points relevant to the question should be included in the marking scheme. Also contained in the marking scheme is the relative
number of marks to be awarded to each point. The examiner should decide in advance on what marks (if any) to be award to
such factors as grammar, spelling, punctuation, neatness, presentation and handwriting. Allowance should also be made for
originality, that is, relevant points given by the candidates, which are not contained in the model answer.

Marking proper

At this stage, the teacher reads the responses to a particular question and gives points for those component parts contained in
the answer. He begins by marking one question at a time for all those that attempted that question using the marking scheme
as a guide. Thereafter, he marks the remaining question, one at a time. Marking one question at a time increases its objectivity
and enables the examiner to maintain a consistent mental set about the relevant points in the model answer to that question.
The examiner is expected to cover the student’s name before marking his script. In addition, the examiner should try not to
look at the scores of previously marked questions when evaluating the remaining questions

Totalling the points

At this final stage, the examiner totals the points awarded to each answer script.

An English Language Question

Your senior brother studying in the U.S.A. has just written home saying he would like to spend the next Christmas holidays at
home. Write him a reply briefly describing, among other things, the preparation being made to receive him, and two recent
events at home which have brought some changes to the family.

Suggested Marking Scheme

1. Content (5 marks)

(i) Introductory greetings

(ii) Family’s reaction to news of his coming home, welcome

Arrangements (accommodation, welcome dance, etc.)

(iii) Two recent events at home

(iv) Closing remarks

2. Organisation (4 marks)

Clear setting of writer’s address, date; paragraphing, opening and closing of letter; addressing the envelope.

3. Mechanical accuracy (5 marks)


Grammar, spelling, and punctuation

4. Communication and expression (6 marks)

An informal letter to a brother studying overseas should be informative and conversational in tone (simple precise
language, correct choice of words and expressions).

Total Score = 20 marks

History Question

What were the causes and consequences of the Fulani jihad on northern Nigeria?

1. Introductory background to the jihad (5 marks)

(i) Decline of Islam in early 19th century.

(ii) Effects of fall of Songhai Empire on Islam

(iii) 18th Century Jihad by Ibrahim Musa, Suleiman Bal, Uthman Dan Folio

2. Causes of Jihad in Hausa land (8 marks)

(i) Remote causes

Political, religious, economic (e.g. excessive taxation of the Fulanis); social (e.g. class struggle, oppressive
rule of Habe rulers.

(ii) Immediate causes

Conflict in Gobir – Uthman versus Yunfa.

3. Consequences (7 marks)

(i) Collapse of Hausa rule in Northern Nigeria

(ii) Weakening of Oyo Empire

(iii) Improvement in education, trade, social mobility

(iv) Strengthening and spread of Islam in Hausa land etc.

Total Score = 20 marks

Obe (1980) outlined the following advantages and disadvantages of point score method of marking essay questions.
Advantages of the Point Score Method

1. The point score technique is based on a definite marking scheme and therefore defensible.
2. It improves the consistency of marking as the marking of one question at a time helps to sustain the mental set.
3. Boredom and fatigue can be minimised by having short marking sessions per question, with periods of relaxation in
between.
4. An external examiner easily moderates the marking.
5. It is generally more reliable.

Disadvantages of Point Score Method

1. The point score method is more time consuming and painstaking.


2. When the number of scripts is very large, this technique can be laborious.

Increasing the Reliability of Essay Marking

The major way to ensure the reliability of essay marking is using marking scheme. However, Mehrens and Lehmann (1978)
offered the following suggestions:

1. Check your marking scheme against actual responses. Before marking, the teacher should select a few papers at random to
ascertain the appropriateness of the marking guide.
2. Be consistent in your grading. Ensure you are not influenced by the few papers you read and thereby mark either too
leniently or too harshly, depending on initial mindset. Occasionally refer to the first few papers marked to ensure that the
standards are being applied consistently.
3. Randomly shuffle the papers before grading them. It is generally assumed that the student’s essay grade may be influenced
by the position of his paper, especially if the preceding scripts were very good or very poor. Shuffling the papers at random
before marking minimises the effects of the preceding grades.
4. Mark only one question at a time for all papers.
5. Mark the scripts anonymously. To protect the student from teacher bias, teachers are advised to cover the names or names
of the students before marking their papers.
6. The mechanics of expression should be judged separately from what the student writes. The proportion of the question’s
points to be assigned to such factors as legibility, spelling, punctuation and grammar should be spelt out in the marking
guide and the students should be so informed.
7. Try to score all responses to a particular question without interruption.
8. If possible, have two independent readings of the script and use the average as the final score. A double reading by two
independent readers will make the scores more reliable.
9. Provide comments and correct errors.

UNIVERSITY OF LAGOS

QUALITY ASSURANCE AND SERVICOM UNIT

OFFICE OF THE VICE CHANCELLOR


TEACHING WORKPLAN TEMPLATE

Faculty: Education

Department: Educational Foundations

Session: 2019/2020

Semester: First

Course Code FED 313

Title of Course: Measurement and Evaluation

Course Units: 2 units

Course Lecturers: Dr. (Mrs.) O.M. Alade (OMA)

Dr. C. E. Okoli (CEO)

Dr. (Mrs.) O.O. Akanni (OON)

Teaching Schedule: Friday 10am – 12noon

Venue: Any available lecture room

COURSE DESCRIPTION

This course is designed to It is student teachers with skills to assess and report the extent of changes that has taken place in
students after a prescribed period of instruction. Similarly, students are exposed to different way to appraise the learners
characteristics such as attitudes, aptitudes, interests, values, social relations, personality and intelligence in other to gain better
understanding of the extent they possess these attributes. This course deals extensively on various devices to measure behaviour
changes of students in cognitive, affective and psychomotor domains.

LEARNING OUTCOMES

By the end of this course students should have achieved the following objectives:

1. Gain mastery of the keys concepts in measurement and evaluation such as of measurement, tests, evaluation, scales of
measurement and types of evaluation techniques in the classroom.
2. Get acquainted with various types of tests and other measurement instruments to assess behaviour changes in the cognitive,
affective and psychomotor domain.
3. Develop a test blueprint and use it to construct essay and objective tests to cover specified objectives..
4. Administer a test, score and undertake analysis of test items with a view to improving the overall quality of the test.
COURSE RESOURCES

Hopkins, C.D. & Antes (1985). Classroom Measurement and Evaluation. Illinois: P.E.Peacock Publishers, Inc.

Mehrens, W. A. & Lehmann, I. J. (1978). Measurement and Evaluation in Educational Psychology. New York:
Rinehart, Holt & Winston.

Nwana, O. C. (1982). Educational Measurement and Evaluation for Teachers.Lagos: Thomas Nelson (Nigeria) Ltd.

Obe, E. O. (1980). Educational Testing in West Africa. Lagos: Premier Press & Publishers.

Okoli. C. E. (2005). Introduction to Educational and Psychological Measurement. Lagos: Behenu Press and
Publishers.

Sax, G. (1980). Principles of Educational and Psychological Measurement and Evaluation. California: Wadsworth
Publishing Co.

COURSE CONTENT

Week Topic Lecturer(S) Remarks


1. Definition of basic concepts – education, measurement, test, and OMA/CEO/OOA
evaluation.
2. Scales of measurement OMA/CEO/OOA
3. The role of measurement and evaluation in education OMA/CEO/OOA
4. The use of measurement and evaluation techniques in the OMA/CEO/OOA
classroom.
5. Measurement of cognitive, affective and psychomotor changes OMA/CEO/OOA
in behaviour
6. Achievement tests – definition, types and uses OMA/CEO/OOA
7. Basic steps in constructing teacher-made achievement tests OMA/CEO/OOA
8. Constructing objective tests OMA/CEO/OOA
9. Constructing essay tests OMA/CEO/OOA
10. Test administration OMA/CEO/OOA
11. Marking objective and essay tests OMA/CEO/OOA
12. Item analysis OMA/CEO/OOA
13. Statistical treatment of test scores with emphasis on descriptive OMA/CEO/OOA
statistics
14. Transformation of raw scores to standard scores OMA/CEO/OOA
15. Qualities to consider in selecting tests and other measurement OMA/CEO/OOA
instruments.
COURSE AND UNIVERSITY POLICIES:

Attendance

Students enrolled in each course are expected to make 65% attendance at Lectures in order to be eligible for examination.

Evaluation

Continuous Assessment = 40marks

Examination = 60 marks.

Total = 100marks

Signature of Lecturers Head of Department

SCALES OF MEASUREMENT

There are four scales of measurement. These are nominal, ordinal, interval and ratio scales.

(a) Nominal scale: Measurement is on a nominal scale whenever numbers are used merely to describe or name rather than to
indicate the order or amount of something. The only measure of a characteristic available from the nominal scale is equivalence
(all items in a given category or with a given scale value are equal). Measurement on a nominal scale entails using numbers or
letters to represent, for example, states of origin, sex, occupation, numbers on football uniforms, etc. These numbers are used
to designate individuals or groups; no comparison can be made based on magnitude.

(b) Ordinal scale: This is a scale of rank or relative importance. It contains two degrees of measure: equivalence and relative
importance (greater than or less than). Measurement is on ordinal scale when numbers refer to ranks of objects or events on
some order of merit. For example, using numbers to designate the order of finishing a race, contest or positions after an
examination. There are no equal differences between the two ranks. Ordinal scale indicates greater than or less than, they
provide no information on how much difference exists between two ranks. For example, we cannot say with certainty that a
student who came second in a class test performed twice as much as the student who came fourth.

(c) Interval scale:A measurement is on an interval scale when equal differences can be interpreted as being equal in whatever
characteristics that are being measured. The interval scale has the properties of equivalence, relative importance and a unique
unit of measurement (equal differences). The Celsius scale of temperature is a good illustration of an equal-interval scale. For
example, we can say with certainty that the difference between 60 0 C and 400 Cis equal to the difference between 300 C and 100
C, both numerically and in terms of temperature (heat). The scores of intelligence tests and classroom achievements are
measured on the interval scale. One limitation of the interval scale is the lack of absolute zero point.

(d) Ratio scale: A ratio scale has all the properties of an interval scale plus an absolute or true zero. Example are scales on
which we measure time, length, and weight.

Roles of Measurement and Evaluation in Education

The following are other roles of measurement and evaluation in education.

1. It serves as the tool for decision making especially to admit students into various institutions of learning.
2. The teacher uses measurement and evaluation to plan his teaching and to make decisions concerning the pace and
progress of teaching.
3. It provides information needed in the guidance and counselling of students.
4. It serves as tool to promote students to new classes after each academic year.
5. It is used for certification of students at the end of their training programme.
6. Measurement and evaluation provides the bases for the modification of an instructional curriculum.
7. It serves a means for reporting students’ progress to their parents and guardians.

THE USE OF EVALUATION TECHNIQUES IN THE CLASSROOM

There are four major evaluation techniques. These are placement, formative, diagnostic, and summative evaluation.

Placement Evaluation

This is used to appraise the students’ entering behaviour. The entering behaviour embraces everything that the student is
carrying with him as he waits for the lesson to commence. It is the learning a child has acquired that is relevant to the lesson or
topic to be taught. It is not necessarily the knowledge acquired in the previous lesson, but in some cases, the knowledge
acquired in the previous lesson may have relevance to the topic to be taught. The entry behaviour determines the readiness
level of the students. The teacher accomplishes the role of placement evaluation through the administration of pre-tests. It
could be a readiness or placement test. This enables the teacher to get the students psychologically and intellectually ready for
his teaching.

Formative Evaluation
The formative evaluation is used while the lesson is being taught to monitor the learning progress. The formative evaluation
helps to form the students’ new behaviour. This is accomplished by supplying of feedback to the students. To be effective, the
feedback should be informative and rewarding. The informative feedback tells the student the extent he is responding correctly
or incorrectly and identifies the specific learning errors that needs correction. The rewarding feedback serves as reinforcement
to successful learning. This could be in form of praises, smiles, clapping of hands, etc. Formative evaluation is not for assigning
of grades. The teacher makes use of oral questions, short written exercises and other activities.

Diagnostic Evaluation

The teacher uses the diagnostic evaluation to determine the causes and areas of a student’s learning difficulties and to
formulate a plan for remedial action. When a student is found to record persistence and recurring difficulties that are left
unresolved by the corrective prescription of formative evaluation, diagnostic tests are administered to ascertain the specific
areas of weakness in order to improve his performance.

Summative Evaluation

This is a type of evaluation done at the end of a course to assign grades or to measure the success of the instruction. How much
a student has learned is determined by subtracting his entry behaviour from the terminal behaviour. Summative evaluation is
used to ascertain the extent to which instructional objectives have been achieved. It is used primarily for assigning of course
grades and for certifying pupil mastery of the intended learning outcomes. The teacher-made tests and observational
techniques are used.

ITEM ANALYSIS

Definition

Item analysis is the process of examining the students’ responses to each test item in order to judge the quality of

the items. It is a statistical technique of reviewing every item on the test with a view to refining the whole test. The

technique helps us to not only to identify poor items, but also to decide why an item is not functioning as one had

planned. Items can be analysed qualitatively, in terms of their content and form, and quantitatively, in terms of their

statistical properties. Qualitative analysis includes the consideration of content validity, and the evaluation of items in

terms of effective item-writing procedures. Quantitative analysis includes principally the measurement of item

difficulty and item discrimination. Item analysis helps us to answer such questions as the following for each item:
1. How hard is the item?

2. Does it distinguish between the better and poorer students?

3. Do all the options attract responses, or are there some that are so unattractive that they might as well not

be included?

Three things are considered in item analysis, namely: item difficulty, discriminating ability of the item and item

choice analysis.

Item Difficulty

The item difficulty pertains to the easiness of the item. A good item should be neither too easy nor too difficult. The

difficulty index of an item is the proportion of the testees who got the item right. The difficulty index ranges from

zero to one (or from zero to 100%). Items whose difficulty index ranges from 20 to 80 percent are acceptable. The

best difficulty index is 0.5. An item should not be too difficult that almost all the testees missed it or too easy that

every testee got it right. The formula for item difficulty (P) is:

Where:

P = the difficulty index for a given item.

U = number of students in the upper group who got the item right.

L = number of students in the lower group who got the item right.

N = number of students in the item analysis group.

Item Discriminating Power

The discriminating power of an item is the extent to which each item distinguishes between those who scored low or

high in the test. It measures how well a test item contributes to separating the upper and lower group. Item
discrimination tells us if an item is showing the differences between capable and less capable students. A good

discriminating item is one in which a greater number of students who scored highly gets right and few of the

students who scored very low gets right. The discriminating index (D) can take values ranging from –1.00 to +1.00.

The higher the D values the better the item discrimination. Any item that has a D value of +.40 and above is

considered very effective. However, D values that range between +.20 and +.39 are considered satisfactory. Any

item that has negative value should be discarded. The formula for the item discrimination power is:
Where:

D = item discrimination power

U = number of students in the upper group who got the item right.

L = number of students in the lower group who got the item right.

N = number of students in the item analysis group

Item Choice Analysis

This is done to determine the effectiveness of the distracters. After identifying the poor test items, such as items that

are too easy, too difficult, or those with zero and negative discrimination, there is need to ascertain what is wrong

with these items. Analysing the effectiveness of distracters entails a comparison of the responses of students in

upper and lower groups. In doing this, the following points should kept in mind:

1. Each distracter should be selected by about equal number of the lower group.

2. Substantially more students in the upper than lower group should respond to the correct alternative.

3. Substantially more students in the lower group should respond to the distracter. If a distracter confuses only

the upper group, it is likely faulty. If it distracts both groups, replace it. This is could be because of vague or

ambiguous question or when the examiner marks the wrong key.


Steps in Item Analysis

1. Administer the test, score the items and arrange the students’ scores in order of merit (highest to lowest).

2. Select the item analysis group (N). This is made up of:

(i) The upper group (best 30% or so).

(ii) The lower group (last 30% or so).

3. Beginning with item number one, count how many students in the upper group (U) that got it right. Thereafter,

count how many students in the lower group (L) that got the item right.

4. Repeat step 3 for other items.

5. For each item, compute the item difficulty.

6. For each item, compute the item discrimination power.

7. Identify the poor items and analyse their item choices or the effectiveness of the distracters.

Example One

Items 1,2,3,4,5……………..48, 49 and 50 on a Mathematics achievement test were passed by the following number of
"upper" and "lower" students:

Item no 1 2 3 4 5 ….. 4 49 50
8

20 "Upper" Students 18 1 5 20 14 8 12 10
4

20 "Lower" Students 16 9 2 20 11 1 10 5
6

Obtain the item difficulty and item discriminating power per item and comment.
Table 1: Solution to Example One

Item Upper (U) Lower (L) Remarks

No 20 20

1 18 16 Too easy

(Reject)

2 14 9 Good

(Accept)

3 5 2 Too difficult

(Reject)

4 20 20 Too easy

(Reject)

5 14 11 Good

(Accept)

48 8 16 Negative

Discrimination

Reject

49 12 10 Good

(Accept)

50 10 5 Fair

(Accept)

Example of item Choice Analysis for Item 48

Choice of Options

A B C D E Total
Upper Group (20) 4 6 1 8 1 20

Lower Group (20) 2 2 0 16 0 20

The correct option is D

Based on the calculations Table 1, items 2, 5, 49 and 50 are satisfactory because their difficulty level and

discriminating power are within acceptable ranges. Items number 3 should be rejected because it is too difficult.

Similarly, items 1 and 4 are too easy and should be rejected. Even though item 48 have good difficulty level, the

discriminating index is negative. It should therefore be discarded. The examiner should undertake item choice

analysis to determine why items 4 and 48 recorded zero and negative discriminating power respectively. It is either

the questions were vague or the examiner may have used a wrong key in marking the items.

Revision Exercise

1. Items 1, 2, 3, 4, 5, …..46, 47, 48, 49, 50 on a Mathematics achievement test were respectively passed by

the following number of "upper' and "lower" students:

Item No 1 2 3 4 5 ….. 46 47 48 49 50

25 "Upper" Students 21 17 25 9 14 ….. 20 14 5 14 17

25 "Lower" Students 14 12 25 9 11 ….. 12 11 3 16 11

Obtain the item difficulty and item discriminating power per item and comment.

Item No Upper Lower Remarks

(U) (L)

2
3

48

49

50

References

Anastasi, A. (1982). Psychological Testing. New York: Macmillan Publishing Co., Inc.

Obe, E. O. (1980). Educational Testing in West Africa. Lagos: Premier Press & Publishers.

Okoli, C.E. (2005). Introduction to Educational and Psychological Measurement. Lagos: Behenu Press and Publishers.

Distractors that are not chosen by any examinees should be replaced or eliminated. They are not contributing to the test's ability

to discriminate the good students from the poor students.

STATISTICAL TREATMENT OF TEST SCORES


Definition of Statistics

Statistics refer to mathematical techniques used in gathering, organising, analysing and interpreting of numerical
data. The raw test data are meaningless unless they are statistically treated. The statistical methods could be
descriptive or inferential.

Types of Statistics

1. Descriptive Statistics
Descriptive statistics comprise those methods concerned with collecting and describing a set of data to yield
meaningful information. It provides information only about the collected data and in no way draws inferences or
conclusions concerning a larger set of data. The descriptive statistics is concerned with the numerical description of a
particular group. No conclusions are extended beyond the group described. The data describe one group and that
one group only. If a teacher is only interested in a description of the performance of a specified class on a particular
test and not in further generalisation, he or she is dealing with a problem of descriptive statistics. Presentation of data
in the form of graphs and tables also fall under the heading of descriptive statistics. The descriptive statistics has
three measures. These are:

a) measures of central tendency,


b) measures of variability and
c) measures of relationship.

2. Inferential Statistics
Inferential statistics comprise those methods concerned with the analysis of a subset of data leading to predictions or
inferences about the entire set of data. In inferential statistics, the teacher selects a sample that is representative of
the population and uses the information obtained from the sample to make inferences about the parameters of the
population. Examples of inferential statistics include the student’s t test and F test.

In this course material, we shall consider the following methods of descriptive statistics, namely

a) Organising data into ungrouped and grouped frequency


b) Graphical representation of data
c) Measures of central tendency
d) Measures of variability
e) Measures of relationship

A.Organising Test Data Into Ungroup And Grouped Frequency Distributions


The list of test scores in a teacher’s grade book is an example of unorganised data. The scores are difficult to
interpret without some type of organisation. The organisation could be in form of ungrouped frequency or grouped
frequency distributions.

(i) The Ungrouped Frequency Distribution


Example:

Organise the following scores into an ungrouped frequency distribution

69 74 67 74 66 70 68 71 75 80

80 72 69 75 68 76 74 74 73 73

73 76 72 72 72 65 64 75 65 70

71 73 71 77 64 66 72 70 78 77

72 71 71 72 78 68 73 76 67 70

Guidelines

1. Locate the position of the lowest and highest scores.


2. List all the score values between and including the two extreme scores in their ascending or descending order of
magnitude.
3. Take each score in the series and tally it against the corresponding score on the list.
4. Write out the number of tallies in figures.
5. If the range (highest score minus lowest score) is more than 20, it is advisable to group the data.

(ii) The Grouped Frequency Distribution


Example

Organise the following tests scores into a grouped frequency distribution.

84 82 70 72 80 62 94 86 68 68

77 89 85 86 46 48 84 88 89 78

86 57 81 70 55 88 79 69 52 61

68 50 77 90 77 78 89 81 67 91

58 73 77 80 78 76 76 83 72 78

Guidelines

1. Obtain the range of the scores by subtracting the lowest score (L) from highest score (H) in the array.
2. Select the desired class width preferably an odd number.
3. Determine the number of class intervals by dividing the range by the desired class width.
4. The class interval containing the highest score values should be placed on top and vice versa.

B. Graphical Representation Of Data


Frequency distributions are often shown graphically using histograms, frequency polygon, frequency polygon, bar
and pie charts.

(i) Frequency Histogram


This consists of series of rectangles and each represents the number of scores in a specific class interval. The
vertical lines of each rectangle are drawn at the exact limits of the specific class interval and the height of the
rectangle is determined by the number of frequency of scores within the class interval.
Procedure for Constructing a Histogram

1. Construct a group distribution of the data.


2. Choose a convenient scale to represent a unit of frequency and another one to represent a unit of class interval.
3. Draw two axes – vertical line and horizontal line.
4. Represent the frequencies along the vertical axis and the exact limits of class interval along the horizontal axis.
5. For each class interval, draw two vertical lines whose height correspond to the frequency of scores falling within
that class interval

(II) Frequency Polygon


To plot the frequency polygon, we first mark the midpoint above each class interval. The height of the points should
correspond with frequency of the respective intervals. The points are then connected to each other to form a
frequency polygon.

(iii) Bar and Pie Charts


Assuming that students enrolment in some courses offered by the Faculty of Education, University of Lagos in a
given session is: Educational Administration (120), Adult Education (40), G & C (55), Business Education (80), and
Chemistry Education (30). We can present the enrolment rate in the various courses using bar and pie charts. To
construct the pie chart, we will first determine the proportion of the circle in terms of degrees each course offering will
cover. The entire pie chart (which is circular in form) covers 360 o degrees. This is shown below.

With the aid of a protractor and a semi circle found in the mathematical set, one can carefully draw a pie chart. The
bar and pie charts for the data in Table 9-3 are presented below.
Bar Chart
140

120

100
Enrolm ent Rate

80

60

40

20

0
Edu. Adm Adult e duc G &C B us . Edu C he m Edu

Courses

Pie Chart

Chem Edu, 30

Edu. Adm, 120


Bus. Edu, 80

G & C, 55 Adult Educ, 40

C. Measures Of Central Tendency


The measure of central tendency (or average) for any given set of scores is the most typical score in the set. It can
also be described as the most central score or the score that is representative of the group. The measures of central
tendency commonly used are the mean, median and mode.

(i) Mean

The mean is the sum of all the scores in the data set divided by the number of scores. The mean is denoted by
(pronounced X bar). The formula for computing the mean of ungrouped data is given below.
Where:

∑ = a Greek letter which stands for summation.

X = individual raw scores.

N = total number of scores.

Example: Find the mean of 10, 12, 8, 11, 12, 13, 9, and 7

= X1 + X2 + X3 + X4 + X5 …….X8

= 10 + 12 + 8 + 11 + 12 + 13 + 9 + 7

= 82

= 10.25

Uses of Arithmetic Mean

1. The average takes account of all the score values in a set of data. It reflects the value of extremely low scores in
the data.
2. When further calculations are required, the mean is the average that will lend itself to further mathematical
manipulations.

(ii) Median
The median is the middle observation of a data set when ordered from the smallest to the biggest. When a set of
data is arranged in order of magnitude, the median is the score that cuts the data into two. If there are two
observations in the middle (this happens when the scores are even-numbered), the median is the average of the two.

Example 1: Find the median of 1 2 5 8 6 4 3

Solution: 1 2 3 4 5 6 8. The median is 4.

(Note that the scores are arranged in order of magnitude).


Example 2: Find the median of 16 14 12 18 22 20

Solution: 12 14 16 18 20 22 (Note that the number of scores are even). The median is the average of
the two scores in the middle.

Uses of the Median

1. When the purpose is to find an average that is not affected by the extreme figures in the population.
2. If the curve is positively skewed, the median is preferred.

(iii) Mode
The mode of a set of scores is the score that has the highest frequency. In a grouped data, the mode is the mid point
of the class interval (modal class) that has the highest frequency.

Example 2.6: Find the mode of 3 3 4 5 6 1 0 3 9 8 3 7.

The answer is 3 because it is the commonest observation or the score with the highest frequency.

D. Measures Of Variability
The measure of variability for a set of scores (also called measure of spread, scatter or dispersion) tells us the extent;
each of the scores deviates from each other. There are three common measures of variability. These are range,
quartiles and standard deviation.

(i) Range
The range is the difference between the highest (H) and lowest (L) score in an array.

Example: Find the range of 10 9 13 16 4 6 12.

Range =H - L

= 16 - 4

= 12

Just like the mode, the range provides us with a quick rough check of the spread or scatter of the scores. However,
its information only concerns the highest and lowest scores ignoring the other scores in the group.
(ii) Quartiles
Quartiles are points that divide the frequency into equal fourths. A quartile is a point on a distribution below, which is
25%, 50%, or 75% of the scores. Suppose a data set is ordered from the least to the highest, the values that divide
the data into four equal sets are called quartiles. Q 1, Q2 and Q3 denote the quartiles. The Q1 or P25 is the lower
quartile, which cuts off 25% of the data from below. The Q 2 or P50 (median) is the second quartile, which cuts off 50%
of the data from below. The Q3 or P75 is the upper quartile, which cuts off 75% of the data from below.

The score distance between the third and first quartiles (Q3 and Q1) is called the interquartile range. The semi-
interquartile (Q) range is half of the interquartile range. As a measure of variability, semi-interquartile range is the
average distance from the median to the second quartile, i.e., it tells how far the quartile points lie from the median,
on the average. The semi-interquartile range is

The semi-interquartile gives us the information about the variability within the middle 50%. In a situation where the
median is the measure of central tendency, the appropriate measure of variability is the semi-interquartile range.

Example: Find the 1st, 2nd and 3rd quartiles of the following scores:

5 6 12 10 20 2 3 9 18 22 5 13

Before computing the quartile values, first arrange the scores in order of magnitude as shown below:

2 3 5 5 6 9 10 12 13 18 20 22.

The number of scores in the data set is 12.

 The first quartile (Q1 or the 3rd score) is 5.


 The second quartile (Q2 or the 6th score) is 9.
 The third quartile (Q3 or the 9th score) is 13.

(iii) Standard Deviation


The standard deviation (denoted by S or S.D.) is used to describe the amount of variability in a distribution. It tells us
the extent the scores deviate above or below the mean. Mathematically, standard deviation is the square root of the
mean of the sum of squared deviations from the mean. When the standard deviation is high, we can conclude that
the marks are widely distributed which is an indication that the students differed markedly in their performance and
vice versa If the class mean is high while the standard deviation is low, a teacher can conclude that the objective of a
particular unit has been achieved. If both the class mean and the standard deviation were low, it implies that the
objective of that unit has not been achieved. The standard deviation can be computed using three methods. These
are: (a) deviation method, (b) raw score method and (c) assumed mean method.

E. Measures Of Relationship

Correlation

Correlation is the relationship between two or more paired variables, that is, two or more sets of data. A teacher may
be interested in finding whether a relationship exists between students' achievement in say Mathematics and Physics
or between Chemistry achievement in mock and Senior School Certificate Examination (SSCE). The relationship
could be negative, zero or positive. A negative relationship means that an increase in variable x is likely to lead to
decrease in variable y and vice versa. On the other hand, a positive relationship implies that an increase in the value
of variable x is likely to lead to increase in variable y and vice versa.

The degree of relationship is represented by the correlation coefficient. This coefficient is denoted by either the Greek
letter rho or the symbol r, depending upon certain assumptions about the distribution and the method used to
compute the coefficient. The resulting coefficient can range from –1.0 to +1.0, where 0 indicates no correlation
between the data sets. A correlation of +1.0 or –1.0 means that you can always predict one data set from the values
in another. The correlation coefficient can be computed using either the Pearson’s product moment or the
Spearman’s rank difference method.

(i) The Product Moment Correlation


The product moment correlation is used when the number of cases is large (more than 30) and when there is large
number of paired scores. The product moment correlation coefficient can be computed using either the raw score or
deviation method.

(ii) Spearman’s Rank Difference Method


This method is used to compute a correlation coefficient when there are few ties and the number of cases is relatively
small (less than 30 pairs).The Spearman’s rank difference method gives only a fair estimate of the correlation
coefficient because it makes use of only the ranks and not the raw scores. Whenever, two or more individuals
receive the same score, each of them is assigned the mean rank position of the tie scores. Another method is to
assign the same rank to the tie scores. In each of the procedure, you skip the next rank (or ranks as the case may
be) for the score coming behind.

TRANSFORMATION OF TEST SCORES

Standard Scores

Raw scores are frequently transformed to other scales to facilitate analysis and interpretation. One such example is
the standard score. The standard score expresses a person’s performance in terms of its deviation from the mean in
standard deviation units. Standard scores have several advantages.

a) First, they measure on an interval scale. By expressing performance in terms of standard deviation units, we
have transformed raw scores to a scale with equal-sized units.
b) Second, the use of standard scores allows us to compare scores from several tests directly, even those
having different means and/or standard deviations.

In this course material, we shall consider three types of standard scores and scales: sigma (z) score, standard Z or T
score, and stanine scale.

(i) Sigma score (z -score)


The z-score is used to reduce scores in different distributions to a common, comparable unit of measurement. It
indicates the distance of a raw score from the mean in standard deviation units. It has a mean of 0 and a standard
deviation of 1. The z-score’s sign (+ or -) shows whether the score is above or below the mean. In comparing or
averaging scores on tests where total values differ, the use of raw scores to compute a mean or average may create
a false basis of comparison. The z-score makes possible equal weighting of the tests. The formula for computing a z-
score is

Where:

X = raw score

M = mean

S = standard deviation
Example:

A teacher wished to get a student’s equally weighted average on a Mathematics and a Geography test . Using the
table below, determine the subject in which the student performed better.

Raw Highest Standard


Score Possible Score Deviation
Subject Mean

Mathematics 45 48 60 5

Geography 62 55 86 6

Mathematics z-score

Geography z-score

The score of 45 in mathematics may be expressed as z-score of –0.6, indicating that 45 is –0.6 standard deviations
below the mean. The score of 62 may be expressed as a sigma score of +1.16, indicating that 62 is +1.16 standard
deviation above the mean. Hence, the student performed better in geography.

(ii) The T-score


The T-score was devised to avoid possible confusion resulting from negative scores (below the mean) and to
eliminate decimals. This is done by multiplying the z-score by 10 and adding 50 to it. Multiplying the z-score by 10,
and approximating, eliminates the decimal from the z-score. Adding 50 to this product ensure that the final score will
be a positive whole number, that is, it will eliminate negative numbers. The T-score has a mean of 50 and a standard
deviation of 10. The formula for T-score is

Mathematics T = 10 (-0.6) + 50 = -6 + 50 = – 6 + 50 = 44

Geography T = 10 (+1.16) + 50 = 11.6 + 50 = 62 (to the nearest whole number


Stanine

The stanine (also called standard nine) is a method of reporting achievement test grades. Instead of using letters as
identification marks for the grades, it uses the numbers 1, 2, 3, 4, 5, 6, 7, 8, and 9. The stanine is consequently a
nine-point scoring scale using the numbers 9 down to 1. The average score in any test normally falls into stanine 5.
The stanine is recommended only where the range of marks or groups of marks in the distribution is wide enough,
preferably twenty or more, and the number of pupils tested large. The only limitation in the use of stanine scores for
computation is the situation where there are ties in the raw scores. Too many ties make the theoretical application of
the formula difficult. The following is the percentage of scores for each stanine:

First 4% = stanine 9

Next 7% = stanine 8

Next 12% = stanine 7

Next 17% = stanine 6

Middle 20% = stanine 5

Next 17% = stanine 4

Next 12% = stanine 3

Next 7% = stanine 2

Last 4% = stanine 1

You might also like