Professional Documents
Culture Documents
Topic: Analysis of Test Scores
Topic: Analysis of Test Scores
Topic: Analysis of Test Scores
Test Scores
10
LEARNING OUTCOMES
INTRODUCTION
Do you know that all the data you have collected on the performance of students
have to be analysed? In this final topic, we will focus on the analysis and
interpretation of the data you have collected about the knowledge, skills and
attitudes of your students. Information you have collected about your students can
be analysed and interpreted quantitatively and qualitatively. For the quantitative
analysis of data, various statistical tools are used. For example, statistics are used
to show the distribution of scores on a Geography test and the average score
obtained by a group of students.
Even the use of percentages may not be meaningful. For example, getting
64 per cent in the test may be considered „good‰ if the test was a difficult one. On
the other hand, if the test was easy, then 64 per cent may be considered to be only
„average‰. In other words, to get a more accurate picture of the scores obtained by
students on the test, the teacher should find out:
(a) Which student obtained the highest marks in the class and the number of
questions correctly answered;
(b) Which student obtained the lowest marks in the class and the number of
questions correctly answered; and
(c) The number of questions correctly answered by all students in the class.
This illustrates that the marks obtained by students in a test should be carefully
examined. It is not enough to just report the marks obtained. More information
should be given about the marks obtained and to do this you have to use statistics.
Some teachers may be afraid of statistics while others may regard it as too time
consuming. In fact, many of us often use statistics without being aware of it. For
example, when we talk about average rainfall, per capita income, interest rates and
percentage increase in our daily lives, we are using the language of statistics. What
is statistics?
For practical reasons, rather than compiling data about an entire population, we
usually select or draw a subset of the population called a sample. In other words,
the 40 Form Two students that you teach is a sample of the population of Form
Two students in the country. The data you collect about the students in your class
can be subjected to statistical analysis, which serves two related purposes, namely,
descriptive and inference.
The „frequency‰ column shows how many students scored for each mark shown
and the percentage is shown in the „percentage‰ column. You can describe these
scores using two types of measures, namely, central tendency and dispersion.
Copyright © Open University Malaysia (OUM)
TOPIC 10 ANALYSIS OF TEST SCORES 223
(a) Mean
The mean is the most commonly used measure of central tendency. When
we talk about an „average‰, we usually refer to the mean. The mean is simply
the sum of all the values (marks) divided by the total number of items
(students) in the set. The result is referred to as the arithmetic mean. Using
the data from Figure 10.1 and applying the formula given, you can calculate
the mean.
Mean
x 35 40 41 ............... 75 1863 53.23
N 35 35
(b) Median
The median is determined by sorting the score obtained from lowest to
highest values and taking the score that is in the middle of the sequence. For
the example in Figure 10.1, the median is 52. There are 17 students with
scores less than 52 and 17 students whose scores are greater than 52. If there
is an even number of students, there will not be a single point at the middle.
So, you calculate the median by taking the mean of the two middle points i.e.
divide the sum of the two scores by 2.
(c) Mode
The mode is the most frequently occurring score in the data set. Which number
appears most often in your data set? In Figure 10.1, the mode is 57 because
seven students obtained that score. However, you can also have more than one
mode. If you have two modes, it is bimodal.
Figure 10.2: Graph showing the distribution of Bahasa Melayu test scores
SELF-CHECK 10.1
10.2.2 Dispersion
Although the mean tells us about the groupÊs average performance, it does not tell
us how close to the average or mean the students scored. For example, did every
student score 80 per cent on the test or were the scores spread out from 0 to
100 per cent? Dispersion is the distribution of the scores and among the measures
used to describe spread are range and standard deviation.
(a) Range
The range of scores in a test refers to the lowest and highest scores obtained
in the test. The range is the distance between the extremes of a distribution.
x x
2
Marks x x x
35 35 ă 39 = -4 (-4)2 = 16
39 39 ă 39 = 0 (0)2 = 0
45 45 ă 39 = 6 (6)2 = 36
40 40 ă 39 = 1 (1)2 = 1
32 32 ă 39 =-7 (-7)2 = 49
42 42 ă 39 = 3 (3)2 = 9
37 37 ă 39 =-2 (-2)2 = 4
44 44 ă 39 = 5 (5)2 = 25
36 36 ă 39 =-3 (-3)2 = 9
41 41 ă 39 = 2 (2)2 = 4
Sum = 390
Mean ( x ) = 39 ¡ ( x ă x )2 = 153
N = 10
Based on the raw scores, you can calculate the standard deviation of a sample
using the formula given.
x x
2
153
Standard deviation 17
N 1 9
4.12
(i) The first step is to find the mean, which is 390 divided by 10 (the
number of students) = 39;
(ii) Next is to subtract the mean from each score in the column labelled
x x . Note that all numbers in this column are positive. The squared
differences are then summed and the square root calculated; and
(iii) The standard deviation is the positive square root of 153 divided by 9
and the result is 4.12.
Note that the smaller the standard deviations, the greater the scores tend to
„bunch‰ around the mean and vice versa. Hence, it is not enough to just
examine the mean alone because the standard deviation tells us a lot about
the spread of the scores around the mean. Which class do you think
performed better? The mean does not tell us which class performed better.
Class C performed the best because approximately two-thirds of the students
scored between 38 and 40.
Copyright © Open University Malaysia (OUM)
TOPIC 10 ANALYSIS OF TEST SCORES 227
There is a negative skew because it has a longer tail in the negative direction.
What does it mean? It means that more students were getting high scores on
the History test which may indicate that either the test was too easy or the
teaching methods and materials were successful in bringing about the
desired learning outcomes.
Now, let us look at Figure 10.5 which shows the distribution of the scores
obtained by 38 students on a Biology test.
There is a positive skew because it has a longer tail in the positive direction.
What does it mean? It means that more students were getting low scores in
the Biology test which indicates that the test was too difficult. Alternatively,
it could imply that the questions were not clear or the teaching methods and
materials did not bring about the desired learning outcomes.
SELF-CHECK 10.2
ACTIVITY 10.1
With these raw scores alone, what can you say about ZulindaÊs performance on
these tests or her standing in the class? Actually, you cannot say very much.
Without knowing how these raw scores compare to the total distribution of raw
scores for each subject, it is difficult to draw any meaningful conclusion regarding
her relative performance in each of these tests.
How do you make these raw scores meaningful? Let us assume that the scores of
all three tests are approximately normally distributed.
The mean and standard deviation of the three tests are as shown in Table 10.2.
Table 10.2: Mean and Standard Deviation for the Three Tests
Based on this additional information, what statements can you make regarding
ZulindaÊs relative performance on each of these three tests? The following are
some conclusions you can make:
(a) Zulinda did best on the History test and her raw score of 72 falls at a point
one standard deviation above the mean;
(b) Her next best score is English and her raw score of 40 falls exactly on the
mean of the distribution of the scores; and
(c) Finally, even though her raw score for Science was 80, it falls one standard
deviation below the mean.
Converting ZulindaÊs raw scores into Z-scores, we can say that she achieved a:
10.3.1 Z-score
What is a Z-score? How do you calculate the Z-score? A Z-score is a type of
standard score. The term standard score is the general name for converting a
raw score to another scale using a predetermined mean and a predetermined
standard deviation. Z-scores tell how many standard deviations away from the
mean the score is located. Z-scores can be positive or negative. A positive Z-score
indicates that the value is above the mean, while a negative Z-score indicates that
the value falls below the mean. A Z-score is a raw score that has been transformed
or converted to a scale with a predetermined mean of 0 and a predetermined
standard deviation of 1. For instance, a Z-score of -6 means that the score is 6
standard deviations below the mean.
The formula used for transforming a raw score into a Z-score involves subtracting
the mean from the raw score and then dividing it by the standard deviation.
x x
Z
SD
For example, let us use this formula to convert KumarÊs marks of 52 obtained in a
Geography test. The mean for the test is 70 and the standard deviation is 7.5.
x x 52 70 18
Z 2.4
SD 7.5 7.5
The Z-score calculated for the raw score of 52 is -2.4 which means that KumarÊs
score for the Geography test is located 2.4 standard deviations below the mean.
Test 1 Test 2
Seng Huat 30 50
Mei Ling 45 35
Mean 42 47
Standard deviation 7 8
The teacher could use the mean to determine who was better. However, both
students have the same mean. How does the teacher decide? Using a Z-score can
tell the teacher how far from the mean are the scores of the two students and thus
who performed better. Using the formula, the teacher calculates the Z-score shown
as follows:
Upon examination of the calculation, the teacher finds that both Seng Huat and
Mei Ling have negative Z-scores for the total of both tests. However, Mei Ling has
a higher total Z-score (-1.07) compared with Seng HuatÊs total Z-score (-1.34). In
other words, Mei LingÊs total score was closer to the mean and therefore the
teacher concludes that Mei Ling did better than Seng Huat.
Z-scores are relatively simple to use but many educators are reluctant to use it,
especially when test scores are reported as negative numbers. How would you like
to have your mathematics score reported as -4? For this reason, alternative
standard score methods are used such as the T-score.
10.3.3 T-score
The T-score was developed by W. McCall in the 1920s and is one of the many
standard scores currently used. T-scores are widely used in the fields of
psychology and education, especially when reporting performance in
standardised tests. The T-score is a standardised score with a mean of 50 and a
standard deviation of 10. The formula for calculating the T-score is:
T = 10(z) + 50
For example, a student has a Z-score of -1.0 and after converting it to T-score, you
get the following:
T = 10 (z) + 50
= 10 (-1.0) + 50
= (-10) + 50
= 40
When converting Z-scores to T-scores, you should be careful not to drop the
negatives. Dropping the negatives will result in a completely different score.
ACTIVITY 10.2
Z-score T-score
+1.0
-2.4
+1.8
2. Why would you use T-scores rather than Z-scores when reporting
the performance of students in the classroom?
However, there will be a few men who will be relatively shorter and an equal
number who are relatively taller. By plotting the heights of all Malaysian men
according to the frequency of occurrence, you can expect to obtain something
similar to a normal distribution curve.
Besides height, normal distribution curve can seen in terms of IQ. Figure 10.6
shows a normal distribution curve for IQ based on the Wechsler intelligence scale
for children.
standard deviation -1 is also 34.13 per cent. Hence, the area between standard
deviation -1 and standard deviation +1 is 68.26 per cent. It means that in a normal
distribution, 68.26 per cent of individuals will score between standard deviations
-1 and +1.
On the other hand, the standard deviation score is the obtained score when we use
the standard deviation formula (which we discussed earlier). For example, if we
obtained a standard deviation of 5, then the score for one standard deviation is 5
and the score for two standard deviations is 10, the score for three standard
deviations is 15 and so forth. Standard deviation values of -1, -2 and -3 will have
corresponding negative scores of -5, -10 and -15.
Note, that in Figure 10.6, Z-scores are indicated from +1 to +4 and -1 to -4 with the
mean as 0. Each interval is equal to one standard deviation. Similarly, T-scores are
reported from 10 to 90 (interval of 10) with the mean set at 50. Each interval of 10
is equal to one standard deviation.
10.5 NORMS
In norm-referenced assessment, an individualÊs performance is evaluated in
relation to other peopleÊs performances. Norm-referenced tests are seldom used in
Malaysia but in the United States, standardised tests are widely used. Perhaps,
because of the decentralised education system in the United States, school-based
assessment is extensively practised. Unlike Malaysia, there are no national
examinations like the PMR and SPM in the United States. Hence, teachers there
who want to find out how their students are performing compared with other
students in the country rely on norm-referenced tests to compare the performances
of their students with the performances of other students in the norm group.
Norms are produced based on the norm sample. For example, if you have norms
of reading ability for children of different age groups, you will be able to compare
the performance of a seven-year-old in your class on the reading ability test with
the rest of the population. In other words, you can determine whether your seven-
year-old is reading at the level of other seven-year-olds in the country. In
establishing these norms, you have to ensure that the norm sample is
representative of the population.
Representativeness
When you compare your students with the rest of the population, you want to
ensure that the norm sample is representative. In other words, the individuals
tested in the norm sample must consist of the appropriate age group, taking into
consideration gender differences, geographic location and cultural differences.
For example, the eight-year-olds selected for the norm sample should reflect eight-
year-olds in the rest of the country according to gender (male and female),
geographic location (urban or rural) and cultural differences. For example, the
norm sample consists of 3,000 Malaysian primary school children with 500
students for each age group (seven-year-olds = 500 students, eight-year-olds = 500
and so forth). The norm sample should consist of children in all the states of
Malaysia, including all the ethnic groups in the country and be drawn from
different socioeconomic backgrounds and geographic locations. Based on the
norm sample of 3,000 primary school children, the following hypothetical norms
on reading ability in Bahasa Melayu for Malaysian children were produced (refer
to Table 10.3).
Percentile ranks (percentiles) are used in standardised tests which allow teachers
to compare the performance of their students with the norm group. An eight-year-
old student who obtained a score of 48 on the test has a percentile rank of 84. This
means that the student is reading at a level as well as, or better than,
84 per cent of other eight-year-old students in the test. Similarly, an eight-year-old
who obtains a percentile rank of 45 is reading as well as, or better than,
45 per cent of eight-year-olds in the norm sample.
To use norms effectively, you should be sure that the norm sample is appropriate,
both for the purpose of testing and for the person being tested. If you recognise
that the test norms are inadequate, you should be cautious because you may obtain
misleading information about the abilities of your students. The organisation
responsible for developing the norms should clearly state the groups tested
because you want to ensure that the norm sample is similar to your students. In
other words, the norm sample should consist of the same type of people in the
same proportion as is found in the population of reference. The norm sample
should be large enough to be stable over time.
SELF-CHECK 10.3
3. Do you think we should have standardised tests with norms for the
measurement of different kinds of abilities? Why?
The term „central tendency‰ refers to the „middle‰ value and is measured
using the mean, median and mode. It is an indication of the location of scores.
The mean is simply the sum of all the values (marks) divided by the total
number of items (students) in the set.
The range of scores in a test refers to the lowest and highest scores obtained in
the test.
Standard deviation refers to how much the scores obtained by students deviate
or differ from the mean.
A negative skew has a longer tail in the negative direction. A positive skew has
a longer tail in the positive direction.
The standard score refers to a raw score that has been converted from one scale
to another scale using the mean and standard deviation.
Z-score tells how many standard deviations away from the mean the score is
located.
The normal curve (also called the „bell curve‰) is a hypothetical curve that is
supposed to represent all natural occurring phenomena.