UNIT 2 Psych. Testing

UNIT 2
BY:
Dr. Khusboo
Assistant Professor
TESTS
 Psychological Tests can be classified on various dimensions. One classification which is based
on the rate of performance distinguishes between Speed Test and Power Test.
 Speed Tests are the ones in which individual differences depend entirely on the speed of
performance. Items are of uniform difficulty, all of which are well within the ability level of
the persons for whom the test is designed; but the time limit is such that no examinee can
attempt all the items. Under such conditions, each person’s score reflects only the speed with
which he worked. According to Gulliksen, a pure Speed Test is a test composed of items so
easy that the subjects never give the wrong answer to any of them, that is, there would be no
attempted item that would be incorrect and, consequently, the score for each person would
equal the number of items attempted.
 Power Tests, on the other hand, have time limits long enough to permit everyone to attempt all
the items. But the difficulty of items is steeply graded and the test includes certain items which
are too difficult for anyone to solve, so that no one can get a perfect score. The level of
difficulty that can be mastered in liberal time is determined. Achievement examinations fall in
this category. In a pure Power Test, all the items are attempted so that the score on the test
depends entirely upon the number of items that are answered and answered correctly.
Therefore, in such tests, the items cannot be of trivial difficulty because that would not
produce a desirable distribution of scores.
SOURCES OF INFORMATION
ABOUT TESTS
Initial inspection of a test and its manual is usually based on a introductory kit or specimen set.
Such a kit provides a copy of the most basic materials for the test, such as (a) the text
booklet; (b) an administration and scoring manual; (3) an answer sheet; (4) a report profile, and;
(5) a technical manual.
 Systematic reviews
A major force of information is provided by systematic reviews of published tests. A systematic

review synthesizes the literature (research studies) on a particular topic, in this case a
psychological test. One popular systematic review sources are: (1) the Mental Measurement
Yearbook (MMY), originated by Buros (and therefore also called Buro's MMY, or simply Buros)
 Special-purpose collections
Special-purpose collections are collections, often books, that provide information about tests on
a specifically selected area, for example the Handbook of Personality Assessment, or the
Integrative Assessment of Adult Personality.
 Books about single tests
Some single tests are the subject of entire books, because they are so widely
used. For example, the Wechsler scales are the subject of a number of books.
These books are commonly written with the assumption that the reader already
has had considerable training in test theory and use.
 Journals
Many journals contain articles in which certain psychological tests are used.
Although most journals include articles that use a particular test, rather than
examining the properties of the test itself, there are also journal that specifically
include articles that review tests. Examples of such journals are the Journal of
Educational Measurement, Applied Measurement in Education,
and Psychological Measurement.
NORMS
 A norm is the average or typical score (mean or median) on a particular test made by a specified
population; for example, the mean intelligence test score for a group of 10-year olds.
 Test norms consist of data that make it possible to determine the relative standing of an
individual who has taken a test. By itself, a subject’s raw score (e.g., the number of answers
that agree with the scoring key) has little meaning. Almost always, a test score must be
interpreted as indicating the subject’s position relative to others in some group. Norms
provide a basis for comparing the individual with a group.
 The characteristics of any table of norms depend on a number of factors affecting the individuals
who make up the group:
 1. In standardizing a psychological test, the norm and the distribution of scores are influenced by
the representativeness of the population sample, that is, by the proportion from each sex, their
geographic distribution, their socio-economic status and their age distribution.
 2. In devising a test of educational achievement, factors influencing the normative data, in addition
to the above, are the quality of the schools and the kinds of curricula from which this
standardization population is drawn.
 3. Norms of tests of aptitude, like clerical or mechanical, are influenced by the standardisation
population’s degree of experience, the kind of work they have been doing and by the
representativeness of the group.
TYPES OF NORMS
 PERCENTILE RANK: An individual’s percentile rank on a test designates the percentage
of cases or scores lying below it. In other words, this statistical device determines at which
one-hundredth part of the distribution of scores or cases is an individual located. For
example, a person having a percentile rank of 20 (P20) is situated above 20 per cent of the
group of which he is a member or, otherwise stated, 20 per cent of the group fall below this
person’s rank. Another Example: if a person gets 97 percentile in a competitive exam, it
means 97% of the participants have scored less than him/her.
 DECILES: Deciles are the points which divide the scale of measurement into 10 equal
parts. Thus, there will be in all 10 deciles, namely, fi rst decile to ninth decile. These deciles
are denoted by D1 , D2 , D3 , D4 , D5 , D6 , D7 , D8 and D9 . The first decile may be
defined as a point on the scale of measurement below which one-tenth (1/10) or 10 per cent
of the total cases lie, second decile as the point below which 20 per cent of the cases lie, and
so on. The term decile is used to mean a dividing point. Decile Rank signifies a range of
scores between two dividing points. For example, a testee who has a decile rank of 10
(D10) is located in the highest 10 per cent of the group; one whose Decile Rrank is 9 (D9 )
is in the second highest 10 per cent; one whose Decile Rank is 1 (D1 ), is in the lowest 10
per cent of the group. The Decile Rank is the same in principle as the percentile; but instead
of designating the one hundredth part of a distribution, it designates the one-tenth part of
the group (N/10).
 STANDARD SCORE: This index too designates an individual’s position with respect to the
total range and distribution of scores, but its index is less obvious than that of Percentile and
Decile Ranks. The standard score indicates in terms of standard deviation as to how far a
particular score is removed from the mean of the distribution. The mean is taken as the zero
point and standard scores are given as plus or minus. Ultimately, standard scores must be
given percentile values to express their full significance.
 STANINE: This term, coined by psychologists in the Army & Air Force during World War II,
is yet another variant of the standard score technique. In this method, the standard population
is divided into nine groups; this is ‘standard nine’ termed as ‘stanine’. Except the ranks of
stanine 1 (lowest) and 9 (highest), each unit is equal to one half of a standard deviation.
 AGE NORMS: Age norms would prove useful to know the capability of a child of a particular
age. Let us take a child of 10 years, who scored a reasoning score of 50. The mean score for
the ten-year olds in the normative sample is 55. Hence, we can find the age group which
corresponds to a reasoning score of 50. It can be seen that the reasoning score of 50
corresponds to a child of age nine years. Therefore, it can be said that this 10-year old child
has a reasoning score of a nine year old child.
 GRADE NORMS: Grade norms are similar to age norms. In this case, the grade levels are
taken in place of age. These norms are very useful for teachers to understand as to how well
are the students progressing in the grade level.
ASSUMPTIONS OF TESTING
When using psychological tests, the following assumptions must be made:
•• Psychological tests measure what they say they measure, and any inferences
that are drawn about test takers based on their test scores are appropriate. This is
also called test validity. If a test is designed to measure mechanical ability, we must assume that
it does indeed measure mechanical ability. If a test is designed to predict performance on the
job, then we must assume that it does indeed predict performance. This assumption must come
from a personal review of the test’s validity data.
•• An individual’s behavior, and therefore test scores, will remain unchanged
over time. This is also called test–retest reliability. If a test is administered at a specific point
in time and then we administer it again at a different point in time (for example, two weeks
later), we must assume, depending on what we are measuring, that an individual will receive a
similar score at both points in time. If we are measuring a relatively stable trait, we should be
much more concerned about this assumption. However, there are some traits, such as mood, that
are not expected to show high test– retest reliability.
 Individuals understand test items similarly (Wiggins, 1973). For example,
when asked to respond true or false to a test item such as “I am almost always
healthy,” we must assume that all test takers interpret “almost always”
similarly.
 Individuals will report their thoughts and feelings honestly (Wiggins,
1973). Even if people are able to report correctly about themselves, they may
choose not to do so. Sometimes people respond how they think the tester wants
them to respond, or they lie so that the outcome benefits them. For example, if
we ask test takers whether they have ever taken a vacation, they may tell us
that they have even if they really have not. Why? Because we expect most
individuals to occasionally take vacations, and therefore the test takers think
we would expect most individuals to answer yes to this question. Criminals
may respond to test questions in a way that makes them appear neurotic or
psychotic so that they can claim they were insane when they committed
crimes. When people report about themselves, we must assume that they will
report their thoughts and feelings honestly, or we must build validity checks
into the test.
 Individuals will report accurately about themselves (for example, about
their personalities, about their likes and dislikes; Wiggins, 1973). When we ask
people to remember something or to tell us how they feel about something, we
must assume that they will remember accurately and that they have the ability
to assess and report accurately on their thoughts and feelings. For example, if
we ask you to tell us whether you agree or disagree with the statement “I have
always liked cats,” you must remember not only how you feel about cats now
but also how you felt about cats previously.
 The test score an individual receives is equal to his or her true ability plus
some error, and this error may be attributable to the test itself, the examiner,
the examinee, or the environment. That is, a test taker’s score may reflect not
only the attribute being measured but also things such as awkward question
wording, errors in administration of the test, examinee fatigue, and the
temperature of the room in which the test was taken. When evaluating an
individual’s score, we must assume that it will include some error.
A GOOD TEST
 Five main characteristics of a good psychological test are as follows: 1.
Objectivity 2. Reliability 3. Validity 4. Norms 5. Practicability!
 1. Objectivity: The test should be free from subjective—judgement regarding
the ability, skill, knowledge, trait or potentiality to be measured and evaluated.
 2. Reliability: This refers to the extent to which they obtained results are
consistent or reliable. When the test is administered on the same sample for
more than once with a reasonable gap of time, a reliable test will yield same
scores. It means the test is trustworthy. There are many methods of testing
reliability of a test.
A GOOD TEST
 3. Validity: It refers to extent to which the test measures what it intends to measure.
For example, when an intelligent test is developed to assess the level of intelligence, it
should assess the intelligence of the person, not other factors. Validity explains us
whether the test fulfils the objective of its development. There are many methods to
assess validity of a test.
 4. Norms: Norms refer to the average performance of a representative sample on a
given test. It gives a picture of average standard of a particular sample in a particular
aspect. Norms are the standard scores, developed by the person who develops test.
The future users of the test can compare their scores with norms to know the level of
their sample.
 5. Practicability: The test must be practicable in- time required for completion, the
length, number of items or questions, scoring, etc. The test should not be too lengthy
and difficult to answer as well as scoring.
THANK YOU

UNIT 2 Psych. Testing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

UNIT 2 Psych. Testing

Uploaded by

Copyright:

Available Formats

UNIT 2

A major force of information is provided by systematic reviews of published tests. A systematic

You might also like