Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Language Assessment Quarterly

ISSN: 1543-4303 (Print) 1543-4311 (Online) Journal homepage: https://www.tandfonline.com/loi/hlaq20

Using Microsoft Excel® to Calculate Descriptive


Statistics and Create Graphs

Nathan T. Carr

To cite this article: Nathan T. Carr (2008) Using Microsoft Excel® to Calculate Descriptive
Statistics and Create Graphs, Language Assessment Quarterly, 5:1, 43-62, DOI:
10.1080/15434300701776336

To link to this article: https://doi.org/10.1080/15434300701776336

Published online: 01 Feb 2008.

Submit your article to this journal

Article views: 1279

View related articles

Citing articles: 5 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=hlaq20
LANGUAGE ASSESSMENT QUARTERLY, 5(1), 43–62, 2008
Copyright © Taylor & Francis Group, LLC
ISSN: 1543-4303 print / 1543-4311 online
DOI: 10.1080/15434300701776336

Using Microsoft Excel® to Calculate


1543-4311Assessment
1543-4303
HLAQ
Language Assessment,Quarterly
Vol. 5, No. 1, Dec 2008: pp. 0–0

Descriptive Statistics and Create Graphs


Nathan T. Carr
Descriptive Statistics, Graphs, and Excel
Carr

California State University, Fullerton

Descriptive statistics and appropriate visual representations of scores are impor-


tant for all test developers, whether they are experienced testers working on
large-scale projects, or novices working on small-scale local tests. Many teachers
put in charge of testing projects do not know why they are important, however,
and are utterly convinced that they lack the mathematical ability to address the
matter anyway. This article begins by explaining why descriptives need to be cal-
culated in the first place, and then discusses ways in which to display data visu-
ally, and how to do this using Microsoft Excel® spreadsheet software. The article
then addresses three types of descriptive statistics: measures of central tendency,
measures of dispersion, and indicators of the shape of the distribution. It dis-
cusses some basic points about interpreting them and then provides simple
instructions for calculating them in Excel. The article assumes that readers are not
particularly familiar with Excel and does not assume a high level of mathematical
sophistication.

WHY DO WE NEED TO BOTHER WITH DESCRIPTIVES, ANYWAY?

It might be best to begin by discussing why we need to worry about descriptive


statistics and related graphical representations of scores in the first place. This
may be a relevant question for novice test developers, especially those involved
in constructing small-scale tests or tests to be used only in their own institutions.
Many language teachers who are developing tests for their institutions or even
just their own classes may find themselves wondering whether they really need
to. They may say that researchers should report these things when writing up

Correspondence should be addressed to Nathan T. Carr, Department of Modern Languages


and Literatures, California State University, 800 N. State College Boulevard, H-835, Fullerton,
CA 92831, USA. E-mail: ncarr@fullerton.edu
44 CARR

their results, and they may expect to see them when reading testing reports pro-
duced by high-priced Professional Testing Experts, but aside from calculating the
average score on a test, they may question the need for further statistical or
graphical description of scores. Such additional work is, no doubt, the province
of large testing organizations (see, e.g., Educational Testing Service, 2005). But
for locally developed tests, why bother, especially if it has never been done
before? On the other hand, others may recognize that they need to but find them-
selves unable to articulate any reasons why.
The easy answer is, “Well, we just do, because it is good testing practice, and
therefore one of our responsibilities as test developers.” In other words, it is part
of what is expected when you are conducting serious assessment, or want your
work to be taken seriously. But why is that, and why should it matter, really? No
doubt some will reply that perhaps they do not want to be so serious, thank you
very much. The official answer is that it is, in fact, necessary for discharging our
ethical duties as language testers: Principle 1 Annotation 5 and Principle 8 Anno-
tation 1 of the International Language Testing Association Code of Ethics (2000)
call, in part,1 for communicating information “in as meaningful a way as possi-
ble” and to do so accurately.
There are several practical reasons for calculating descriptive statistics as
well, however. The first is that descriptive statistics let us know whether it is
appropriate to perform certain statistical tests (e.g., whether it is appropriate to
perform a t test to determine whether two groups performed differently to a sta-
tistically significant degree). Discussion of these statistical tests is beyond the
scope of this article, but they cannot be considered appropriately without first
paying attention to the concerns discussed here. Another practical reason for cal-
culating descriptives is that they also let us know which correlation coefficient
we can use appropriately on our test scores. This is particularly important when
it comes to deciding between the Pearson product–moment correlation coeffi-
cient (Pearson r) and Spearman rho (ρ).
Descriptive statistics are also used as a part of other statistical analyses that are
important to ensuring test quality, such as estimating test reliability (Bachman,
2004). For example, the formula for Cronbach’s alpha—an estimate of the score
consistency of a test—requires calculating the variance for total test scores as well
as for the scores on each individual item. Similarly, if we are interested in improving
reliability by revising problematic items, the most common way of estimating item

1
The full text of Principle 1; Annotation 5 is “Language testers shall endeavour to communicate
the information they produce to all relevant stakeholders in as meaningful a way as possible.” Princi-
ple 8, Annotation 1 reads “When test results are obtained on behalf of institutions (government
departments, professional bodies, universities, schools, companies) language testers have an obliga-
tion to report those results accurately, however unwelcome they may be to the test takers and other
stakeholders (families, prospective employers etc).”
DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL 45

difficulty (item facility) involves calculating the mean score for each item, as does
a family of well-known approaches to estimating item discrimination (upper–lower
method item discrimination, the difference index, and the B-index).
Even more important, descriptive statistics give us basic information about
how people did on our tests. Presumably, we are interested in how people per-
formed overall. But “how the students did on the test” involves more than just
reporting the average score, as is explained next. Descriptives and related graphi-
cal representations of our results help us determine whether we have the distribu-
tion of scores that we expected—or even needed. In other words, is the test
functioning as we expected it to? Descriptive statistics offer a precise description
of this (Bachman, 2004), whereas graphs of the score distribution provide a more
intuitive and holistic description of score data.
For example, if we are administering a pretest in a language class (a criterion-
referenced pretest, such as a baseline measure intended to verify that students do
not already know the material about to be taught), we normally expect that the
majority of the examinees will score quite poorly, because they have not been
taught the material yet. Knowing whether this expectation actually holds true is
important; if it does not, the students may be in the wrong class, the course con-
tent may need to be revised so as to better meet their learning needs, or perhaps
the test was inappropriately constructed. In contrast, on a final exam (a criterion-
referenced post-test), we expect most of the students to have mastered the mate-
rial. If the descriptive statistics and graphs of the score distribution do not show
this, however, we will know that we need to revise the course content or teach-
ing, or revise the test. Similarly, if we have reason to expect a normal distribution
in our test scores (i.e., the classical “bell curve”), we can use descriptive statistics
and graphs to see how well our results match our expectations. Situations in
which this expectation might be reasonable include proficiency testing, as well as
any other test where we expect most people to be average, and equal numbers to
be above and below average.
Finally, descriptive statistics and graphical representations of data can be use-
ful when making comparisons between sets of test scores. Although there are sta-
tistical tests for doing this more precisely, we can still get some indication of the
degree of similarity or difference between groups by comparing descriptives and
graphs. Furthermore, even if we establish that a difference is statistically “signif-
icant”—large enough that it probably did not happen by chance—that does not
mean that the difference is large enough to really matter. Examining descriptive
statistics such as the means and standard deviations, and comparing graphs of
score distributions, can help us judge whether significant differences are in fact
meaningfully large. One example of when we might want to do this is when we
wish to compare how a group of students performed on two different forms
(i.e., versions) of a test. Another example might be when we wish to compare the
performance of different groups taking the same test.
46 CARR

PUTTING DATA INTO EXCEL

Before we can construct charts of our data, or calculate any descriptive statistics,
we first need to get our score data into an electronic format. The best way to do
this is with a spreadsheet program; most people will probably use Microsoft
Excel® spreadsheet software, although as Brown (2005) pointed out, the proce-
dures are highly similar regardless of the particular program being used. The only
differences of real importance here are that programs other than Excel (such as
version 15 of SPSS; see Bachman & Kunnan, 2005, for comparison) may use
slightly different names for the functions discussed here and that the procedures
for creating histograms and other graphs will also differ. As Brown also noted,
the procedures described here are virtually identical for all versions of Excel. For
those who are easily intimidated by math in general and statistics in particular, I
should make something clear right now: Excel is not a program that people have
to be good at math to use. Rather, it is a program that people use when they do
not want to do math themselves. None of the procedures outlined in this article
will require you to anything beyond adding, subtracting, multiplying, or divid-
ing—and very little of that, too.
Unless the data are being imported from a text file, as would be the case with
tests that use optically scanned score sheets, results will probably have to be
entered by hand. The most accurate way to do this is to have one person reading
the names and data while a second person types everything. Not only is this
method usually faster, it also allows the person entering the data to watch the
screen at all times, increasing the likelihood that data entry errors will be caught
immediately. The nearly universal practice for arranging the file is to have each
examinee’s data in their own separate row in the spreadsheet, with each score—
whether for individual items, sections of the test, or just total test score—in its
own column. Even when there are multiple sources of data for each test taker, as
when a speaking test has been scored by two raters, each test taker should have
one row in the data set. In other words, a given examinee’s data should not be put
into more than one row. Aside from custom, there are practical reasons for
arranging the data this way, not least of which is that none of the procedures
described here will work otherwise.

DISPLAYING DATA VISUALLY

Overview of Visual Representations of Data


Visual representations of data can be extremely useful. Interpreting them is much
more intuitive than interpreting statistics alone, although using the two formats
together provides the greatest clarity. Essentially, visual representations simply
DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL 47

TABLE 1
Example of a Frequency Distribution

Bin Frequency

0 0
10 0
20 1
30 4
40 7
50 10
60 10
70 1
80 2
90 0
100 0

tell us how many test takers got each score. The first step in doing this is to com-
pile a frequency distribution, a table that shows each score that someone received
on the test. In the past, this might have been done by hand, with paper and pencil;
for example, for a test worth 0 to 100 points, a tally mark would be made next to
each possible score, and after all the tests had been gone through, the marks
would be totaled and reported in a table going from highest score to lowest
(Brown, 2005; Guilford & Fruchter, 1978). As Bachman (2004) pointed out,
however, unless we have a small sample, it is more informative to group scores
in the frequency distribution. Fortunately, the entire process can now be done in
moments using Excel, as will be explained next. An example of a frequency dis-
tribution can be seen in Table 1, which reports frequencies for a small simulated
data set of 35 cases, similar in size to what a classroom teacher might expect to
deal with.
Guilford and Fruchter (1978) pointed out the importance, when grouping
scores in a frequency distribution, of using appropriately sized intervals, which
Excel refers to as “bins.”2 They recommended using 10 to 20 intervals, with 10 to 15
being more common. They also recommended using certain sizes for intervals—
generally 2, 3, 5, 10, or 20 of whatever units are being used—and beginning each
interval with a number evenly divisible by the size being used (e.g., if 5-unit
intervals are used, then start each one with a number divisible by 5). Once the bin
size has been determined and the frequency distribution created, it can be
graphed. One way to do this is with a frequency polygon, which is basically a

2
I generally use the term bin instead of the better-sounding score interval in this article to remain
consistent with the usage in Excel. If the term does not seem to make much sense, imagine that we are
sorting potatoes, or buttons, and putting them into a number of storage bins based on their sizes.
48 CARR

12

10

0
0 10 20 30 40 50 60 70 80 90 100

FIGURE 1 Example of a frequency polygon.

line graph of the frequency distribution. An example can be seen in Figure 1.


Another type of graph that is even more commonly used is a histogram, which is
a bar graph of the frequency distribution (see Figure 2 for an example). Both of
the examples in Figures 1 and 2 are based on the frequency distribution in Table 1.

Creating Histograms in Excel


The first step is making sure that the Data Analysis Toolpack has been installed
in Excel. This does not always require the installation disk for your copy of
Excel, although it seems that newer versions of Excel, or computers that have
recently installed Microsoft updates, may be more likely to require it. Because
this toolpack is not familiar to most Excel users, I briefly explain how to install it.

12

10

0
0 10 20 30 40 50 60 70 80 90 100

FIGURE 2 Example of a histogram.


DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL 49

Begin by clicking on the Tools menu. If you see “Data Analysis. . .,” the toolpack
is installed already. If it is not, click on “Add-Ins. . .” and select the “Analysis
ToolPak.” You will then be asked whether you want to install the feature; click
on “Yes” and follow any additional directions. You will not need to reboot your
computer or restart Excel when the installation is finished.
Once the toolpack is installed, you are ready to construct a histogram in Excel.
You begin by setting up the “bins.” Although this is theoretically optional, the
results will be much more useful if you set an appropriate bin size (see Figure 3
for an example of what happens when the bins are not specified in Excel and the
program is left to determine them itself). All this requires is finding an empty col-
umn in the spreadsheet and entering the interval boundaries in ascending order;
see the first column of Table 1 for an example. You do not need to create the fre-
quency distribution yourself, as Excel will do this for you automatically when it
creates the histogram. Once the bins have been created, go to Tools → Data
Analysis. . ., select the “Histogram” option, and click on the “OK” button.
The first text box is the input range; click on the button to its right (the one
with the red arrow), and the dialogue box will almost entirely disappear, aside
from a floating text box. This happens so that you can navigate the spreadsheet
and select the raw data for which you wish to construct a histogram. Once you
have selected the data, click on the button on the right edge of the floating text
box (the one with the red arrow). Then repeat the process for the bins range.
Another important part of the process is choosing from among the three
options for output location. Normally, it is better to select “Output Range” or
“New Worksheet Ply”; the former will put the histogram in the current work-
sheet, whereas the latter will create a new worksheet tab within the current work-
book (i.e., within the same Excel file). Choosing “New Workbook” will create a
new Excel workbook, which will probably be neither necessary nor useful for
most users. Finally, it is important to select the “Chart Output” checkbox, or
Excel will only produce a frequency distribution table, with no histogram. When
this is done, click “OK” and watch the histogram appear.

150
Frequency

100
50
0
3.454545

14.90909

26.36363

37.81818

49.27272

60.72727

72.18181

83.63636

95.09090

106.5454

More
–8

Test Score

FIGURE 3 Results of letting Excel determine the bins automatically.


50 CARR

Note that once one histogram has been created in a session of using Excel, the
next one will contain the previous settings as a default. It is also worth remem-
bering that the ever-popular “undo” button will not work with a histogram—once
created, the chart and the frequency distribution table must be deleted if there has
been a mistake.
Because the histogram is a chart, it can be reformatted like any other chart in
Excel. Resizing works the same as with any chart or picture in Microsoft Office®
System applications. Many users will probably want to revise the labels; for
example, “Bins,” the default label for the x-axis, should probably be replaced
with something more informative, such as “Test Score.” Likewise, unless the
document will be printed and copied in color, it is better to change all graphs to
black, white, and gray. The color of the bars can be changed by right-clicking on
one, selecting “Format Data Series. . .,” and changing the color settings on the
“Patterns” tab. I recommend a dark gray color for clarity. The color of the plot
area can be changed to white in a similar fashion after right-clicking an empty
place in the plot area and selecting “Format Plot Area. . .” The X or Y axis may
be formatted—including the direction of the text—by right-clicking on any of the
text labeling the axis and selecting “Format Axis. . .” Text such as the title and
legend of the graph can be deleted entirely, if desired, by simply clicking on the
box and hitting the Delete key. Text that is not deleted can be formatted by right-
clicking it, selecting the format option (“Format Axis. . . ,” “Format Axis Title. . . ,”
etc.), and clicking on the “Font” tab. One area for particular attention in the
“Font” tab is the “Auto scale” checkbox, which controls whether the text size
stays the same at all times or automatically adjusts as the chart is resized. It is
important to note that formatting must be applied separately for each text box
within the chart.
To add a trendline,3 right-click on one of the bars in the graph, and select
“Add Trendline. . .” Finally, a frequency polygon can be created by changing the
chart type. This is done by right-clicking one of the bars in the graph, selecting
“Chart Type. . .,” and in the “Standard Types” tab selecting “Line” as the chart
type and clicking on “Line with markers displayed at each data value” as the
chart subtype. Note that even if the color of the bars had been changed already,
converting to a frequency polygon or adding a trend line will produce a colored
line, which should then be reformatted to black and white. A slightly reformatted
example of a histogram can be seen in Figure 4, whereas Figure 5 shows the
same histogram with a trend line added. Figure 6 shows a frequency polygon for
the same variable.
To insert a chart into a Microsoft Word® document, simply click on the empty
space inside the borders of the chart—not a section with text or graphics—and

3
Note that Excel does not allow users to superimpose a normal curve over a histogram.
DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL 51

250
200
Frequency

150
100
50
0
0

10

20

30

40

50

60

70

80

90

100
Test Score

FIGURE 4 Histogram showing a relatively normal distribution.

250
200
Frequency

150
100
50
0
0

10

20

30

40

50

60

70

80

90

100
Test Score

FIGURE 5 Histogram showing a relatively normal distribution with a trend line added.

250
200
Frequency

150
100
50
0
0

10

20

30

40

50

60

70

80

90

100

Test Score

FIGURE 6 Frequency polygon corresponding to the histograms in Figures 4 and 5.

copy it. Open the Word document, put the cursor where you want to insert the
chart, and paste it in. It is important to note that all chart formatting should be
done in Excel first. Reformatting is not always possible in Word, and some
attempts at formatting in Word can even disrupt other, unrelated parts of the chart.
Therefore, once the document is pasted into Word, you should plan to do no addi-
tional formatting beyond changing the size of the chart. If you do need to make
changes, make them in Excel, and then paste in the new version of the chart.
52 CARR

Figures 7 through 9 show histograms for 34, 194, and 991 cases, respectively.
Note that as the sample size increases, the shape of the histogram grows
smoother; that is, the larger the sample, the more it will tend to approximate the
normal distribution. This illustrates the point that large samples tend to yield
smoother graphs, although they do not guarantee that you will obtain the

7
6
5
Frequency

4
3
2
1
0

100
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
Test Score

FIGURE 7 Example histogram with 34 cases and automatically scaled y-axis.

30
25
Frequency

20
15
10
5
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100

Test Score

FIGURE 8 Example histogram with 194 cases and automatically scaled y-axis.

120
100
Frequency

80
60
40
20
0
100
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95

Test Score

FIGURE 9 Example histogram with 991 cases and automatically scaled y-axis.
DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL 53

distributional shape—discussed next—that you were anticipating. It should be


further noted that these three histograms deliberately violate a basic principle of
graphically representing data: When graphs are being compared, they should all
be on the same scale; in these three figures, it appears at first glance that similar
numbers of test takers are involved, unless one pays close attention to the scale of
the (vertical) y-axis. When histograms or other types of charts are going to be
compared or considered together, therefore, they need to be placed on the same
scale. It can be challenging to find a scale that works for sample sizes that differ
as much as the ones we see here; however, as Figures 10 through 12 show, doing
so can illustrate important information that might otherwise be missed. Such
adjustments in scale are made by right-clicking the labels of the axis in question,
selecting “Format axis. . .,” clicking on the “Scale” tab, and adjusting the values
for minimum, maximum, and major unit. Minor units can usually be ignored, as
can the tick mark options on the “Patterns” tab. If the number labels do not fit on
the axis, it may be necessary to rotate the labels by reopening the “Format axis. . .”
dialogue box, clicking on the “Alignment” tab, and making the text vertical or
angled.4

WHAT DESCRIPTIVES DO YOU NEED TO CALCULATE?

Having addressed why it is important to calculate descriptive statistics, the next


logical step is to identify which ones we need to report. Descriptives can be
divided into three groups: indexes that describe the shape of the distribution,
measures of central tendency, and measures of dispersion.

Indexes That Describe the Shape of the Distribution


When using statistics to describe the shape of a distribution, we report the skew-
ness (also frequently called the skew) and the kurtosis. These two statistics are
probably the least familiar to beginners, but both are relatively simple to grasp.
Skewness, as its name suggests, indicates how far a distribution is “skewed” off-
center. A perfectly normal distribution—the bell curve that is approximated in
Figure 13—has a skewness of zero. Figures 14 and 15, on the other hand, provide
examples of large positive and negative skewness, respectively. Many people
have trouble at first keeping straight which direction is negative and which is
positive, because the rule can seem counterintuitive at first. The thing to remember

4
When the last number on the axis has more digits than the others, the final digit sometimes may
not display if the labels are oriented vertically. The solution to that problem is to put them at a slight
angle, as in many of the examples in this article, which use a −75° orientation for this very reason.
54 CARR

110
100
90
80
Frequency

70
60
50
40
30
20
10
0

100
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
Test Score

FIGURE 10 Example histogram with 34 cases and y-axis on the same scale as Figures 11
and 12.

110
100
90
80
Frequency

70
60
50
40
30
20
10
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100

Test Score

FIGURE 11 Example histogram with 194 cases and y-axis on the same scale as Figures 10
and 12.

110
100
90
80
Frequency

70
60
50
40
30
20
10
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100

Test Score

FIGURE 12 Example histogram with 991 cases and y-axis on the same scale as Figures 10
and 11.
DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL 55

250
200
Frequency

150
100
50
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
0

Test Score

FIGURE 13 Histogram showing a near-perfect normal distribution.

250
200
Frequency

150
100
50
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Test Score

FIGURE 14 Histogram of a positively skewed distribution.

250
200
Frequency

150
100
50
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100

Test Score

FIGURE 15 Histogram of a negatively skewed distribution.

is that “the tail tells the tale.” In other words, the “tail” points in the direction of
the sign. The sign is not determined by which side the “hump” is on.
The other statistic used to describe the shape of the distribution is the kurtosis.
This tells us how flat or peaked the distribution is. A perfectly normal distribu-
tion has a kurtosis of zero; a distribution in which the scores are clustered tightly
56 CARR

250
200
Frequency

150
100
50
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Test Score

FIGURE 16 Histogram of a distribution with positive kurtosis.

250
200
Frequency

150
100
50
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100 Test Score

FIGURE 17 Histogram of a distribution with negative kurtosis.

together (see Figure 16) has positive kurtosis, whereas one in which they are
spread out (see Figure 17) has negative kurtosis. If this seems difficult to keep
straight at first, remember that high, peaked distributions are positive, and low,
flattened-out ones are negative.

Measures of Central Tendency


Measures of central tendency (or measures of grouping; Bachman, 2004), as the
term suggests, tell us where the middle values in our data are located. There are
three that are commonly reported: the mean, the median, and the mode. The
mean is usually described as the arithmetic average5 of the scores. The median is
the middle score—that is, half of the scores are above it, and half are below it,
meaning that it is at the 50th percentile. When there is an even number of scores,

5
People taking their first—or second, for that matter—course in testing, statistics, or research
methods often wonder what the difference is between an arithmetic average and the “ordinary” aver-
ages they learned to calculate in elementary school. There is no difference.
DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL 57

the median is the average of the middle two scores. The third measure of central
tendency, the mode, is simply the most common score.
These three measures provide differing levels of information and are useful in
different contexts. The mode provides us with the least information of the three—
knowing the most common score is nice, but it does not necessarily tell us much
about what is going on with everyone else who took our test. The mode is more
useful in cases where the variable being described is not a score, but a category—
for example, when we are classifying students by first language. As Bachman
(2004) pointed out, the median is particularly useful when the distribution is
skewed—extreme values in the “tail” of the distribution will have a disproportion-
ate effect on the mean—and in small samples, although the latter reason is largely
because small samples are unlikely to have a normal distribution. It is also appro-
priate for rankings and for scores where the distance between levels is not necessar-
ily the same in every case (e.g., if an essay test is rated using a 5-point rating scale,
the difference in quality between a 2 and 3 might not be the same as the difference
between a 3 and 4). The mean is appropriate for any case in which the variable is
not highly skewed and the distances between levels of the variable are equivalent;
this is usually the case when test scores are based on a number of items.
In a normal distribution, the three measures of central tendency should be grouped
or clustered together fairly closely. If the distribution is skewed, they will be farther
apart. In particular, the larger the skewness is, the greater the distance will be between
the mean and the median. The median will be closer to the center of the “hump” of
the distribution, and the mean will be closer to the tail. The mode, of course, will be at
the tallest point of the “hump,” because that represents the most common score.

Measures of Dispersion
Measures of dispersion, as their name suggests, indicate how spread out scores
are for a particular variable. They include the standard deviation, variance, semi-
interquartile range, and range. As with the measures of central tendency, these
indexes provide varying amounts of information about the data they describe.
The standard deviation is the most informative measure of dispersion and is
appropriate any time that it is appropriate to use the mean. To understand what
the standard deviation is, it is useful to keep in mind that on a given test, very few
test takers will receive scores exactly equal to the mean; that is, there will be
some difference between each examinee’s score and the mean. Conceptually, the
standard deviation is similar6 to the average of these differences.

6
Strictly speaking, the standard deviation is not really the average of the differences, which is
referred to as the mean deviation (Gorard, 2004). The mean deviation is used so seldom, however,
that thinking of the standard deviation as the average of the differences is unlikely to cause any
problems.
58 CARR

Further complicating the picture is that there are two formulas for the standard
deviation: one for when we are calculating the standard deviation for a sample,
and one for when we are calculating the standard deviation for the entire
population of interest. The population formula for the standard deviation is

S=
∑ ( X − M )2 , and the sample formula is s = ∑ ( X − M )2
(Brown, 2005),
N n −1
where X is an individual test taker’s score, M is the mean, and N or n is the popu-
lation or sample size, respectively.7 The formulas yield almost identical results if
the number of cases is large, but in small groups, the difference can be notice-
able. If the test takers whose data are being analyzed are all of the test takers who
could be expected to take that test, then the population formula should be used—
as, for example, when all of the students in a particular language program are
included in the analysis (Brown, 2005). This issue arises again in the context of
estimating test reliability, but further discussion of the matter lies beyond the
scope of this article.
The variance is simply the square of the standard deviation; therefore, as there
are two formulas for the standard deviation, there are also two for the variance—
that is, for the variance of a population and the variance of a sample. The vari-
ance is not very useful in and of itself, but it is used in calculating a number of
other statistics, such as Cronbach’s alpha, an estimate of internal consistency reli-
ability (Allen & Yen, 1979).
The semi-interquartile range (Bachman, 2004) is based on the notion of quartiles,
divisions of the scores into four equally sized groups. Also referred to as the
quartile deviation, it is the average of the difference between the median (the
50th percentile) and the 25th and 75th percentiles; that is, between the second,
first, and third quartiles, respectively. Its calculation is very straightforward once
the values of the first and third quartiles are calculated: Q − Q1 Fortunately,
Q= 3
2
finding these values is very simple in Excel. The semi-interquartile range should
be reported any time the median is used.
The range is probably the simplest of the indicators of dispersion and is equal
to the highest score minus the lowest score, plus 1. As Bachman (2004) noted,
although it is the simplest of these indicators, it is also the least informative, as
distributions with widely varying shapes may all have the same range. This is the
case, in fact, in Figures 13 through 17.

7
Note the use of capital S in the population formula, and lowercase s for the sample formula. The
abbreviation SD is also commonly used, and does not specifically refer to either the population or
sample version.
DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL 59

What Are “Good” Values for Descriptive Statistics, and How Normal Is
Normal Enough?
A common question from people applying these statistics for the first time is to
wonder what “good” values are for descriptive statistics. There is no single
answer to this, as it depends on the distribution that is expected or needed in a
particular situation. For example, we would expect scores on a proficiency test to
be normally distributed (see Figure 13 for an example of an approximately nor-
mal distribution). In most cases, we might expect the same for a placement test,
even if it is criterion referenced. A normal distribution is also an assumption for
certain statistical tests. On the other hand, in a criterion-referenced pretest, given
to learners to assess their knowledge of something that has not been taught yet,
we generally expect a positively skewed distribution (see Figure 14). That is
because most of the test takers will have very low scores, although a few may
already know the material they are about to be taught, and will thus score higher
than the others. Similarly, a criterion-referenced posttest should have a nega-
tively skewed distribution (see Figure 15), because the vast majority of the stu-
dents will (we hope!) have mastered the content, and only a few will have very
low scores.
When we do not get the distribution we expect, there is something wrong. The
problem may lie with the test itself, or there may have been something problem-
atic about our assumptions. To learn which it was will require gathering addi-
tional information about our students, further analyzing the test (e.g., item and
reliability analyses), or both. That is why we need to calculate descriptive statis-
tics and create graphs of score distributions: to tell us whether our tests are func-
tioning as we expected or not.
So how normal is “normal enough,” and how skewed should we expect our
pretests and posttests to be? Bachman (2004, p. 74) advised that as long as the
skewness and kurtosis values are between −2 and +2, the distribution is “reason-
ably normal,” meaning that it would be appropriate to perform analyses that
require normality to be appropriate (e.g., calculating the Pearson r, or performing
a t test). On the other hand, that does not automatically mean that a criterion-
referenced pretest should have a skewness of at least +2, or that a criterion-
referenced posttest should have at least a −2 skew. There are no rules of thumb of
which I am aware for these values; I therefore recommend looking at not only the
skewness statistic but also a histogram or frequency polygon of the scores to see
whether it has a shape that seems reasonable in light of the content being tested
and what you expect the students to know already when they take the test.
Finally, skewness and kurtosis are probably the most commonly misinter-
preted and overinterpreted statistics discussed in this article. In particular, it is
important for beginners to keep in mind that any distribution found in real life
will have some degree of positive or negative skewness and kurtosis. Thus, having
60 CARR

TABLE 2
Functions and Formulas for Calculating Descriptive Statistics in Microsoft
Excel Spreadsheet Software

Excel Function or Result for theExample


Statistic Formula Data Set

Mean (M) = AVERAGE(B2:B11) 45.9


Median (Mdn) = MEDIAN(B2:B11) 47.5
Mode = MODE(B2:B11) 38
High score = MAX(B2:B11) 79
Low score = MIN(B2:B11) 3
Range = B16−B17+1 77
Third quartile (Q3) = QUARTILE(B2:B11, 3) 56.8
First quartile (Q1) = QUARTILE(B2:B11, 1) 38
Semi-interquartile range (Q) = (B19−B20)/2 9.4
Standard deviation (sample) (s, sdsample) = STDEV(B2:B11) 20.7
Standard deviation (population) (S, sdpop) = STDEVP(B2:B11) 19.6
Variance (sample) (s2, varsample) = VAR(B2:B11) 427.7
Variance (population) (S2, varpop) = VARP(B2:B11) 384.9
Skewness = SKEW(B2:B11) −0.632
Kurtosis = KURT(B2:B11) 1.354

a minor negative skewness (e.g., –0.034) does not necessarily suggest that a test
was a post-test. The same holds true for minor positive skewness and pretests.

Calculating These Statistics in Excel


Calculating the measures of central tendency and dispersion just discussed is quite
easy in Excel. In most cases, Excel already contains a function that calculates the
desired statistic. In the case of the range and semi-interquartile range, however, there
is no function, which requires users to use functions Excel does have to calculate the
highest score, lowest score, first quartile (Q1), and third quartile (Q3). These values
are then used to calculate the desired statistics. The functions and formulas used are
given in Table 2, along with the results for this sample data set.8 The ranges used in
the functions are based on the example in Figure 18, in which descriptives are calcu-
lated using a data set with 10 cases located in cells B2 through B11.
When using a function in Excel, there are two ways to proceed. The faster way is to
type in the function as written in Table 2, starting with the “equals” sign. After typing
the open parentheses sign, move the cursor to the top of the range (group of contiguous
cells) containing the scores. Click on that cell, and then select the entire range. You can

8
For those having trouble reading the scores in the figure, they are 45, 50, 38, 79, 56, 38, 57, 3, 30,
and 63.
DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL 61

FIGURE 18 Example of calculating the mean in Microsoft Excel spreadsheet software.


Note. Microsoft Excel screen shot reprinted with permission from Microsoft Corporation.

select the range by holding down the mouse button and dragging the cursor down to the
bottom of the data, or by holding down the Shift key and pressing the down arrow (↓)
or Page Down key on the keyboard. When the entire range is selected, type the close
parentheses sign, and hit the Enter key. Note that the functions are not case sensitive.
When a user is less confident, however, or cannot remember exactly how a
function works, it is possible to click on Insert → Function. . ., and find the desired
function there. Something that may prove initially confusing is that for many func-
tions, Excel has two text boxes, labeled “Number 1” and “Number 2.” Simply
ignore the second one, and click on the button to the right of the first text box (the
one with the red arrow). As with creating histograms, most of the dialogue box will
disappear, and you select the range containing the data. After finishing, press the
Enter key, or click on the button to the right of the floating text box (again, the but-
ton with the red arrow). Click OK, and the function is calculated.
When entering the formulas for the range and semi-interquartile range, instead
of typing the specific cell addresses given in Table 2, it is necessary to use the
addresses in your own spreadsheet. When it would be time to type a cell address,
use the mouse to click on the desired cell (e.g., the cell containing the value for
Q3), and then continue typing—that is, do not then click on the cell where you are
62 CARR

entering the formula. When the formula is finished, hit the Enter key. As a final
point, the spaces in the formulas are optional.

CONCLUSION

In summary, it is important that we calculate descriptive statistics for our tests


and that we create visual representations of our data. Only by doing both will we
gain—or provide to others—a full picture of what is going on with our tests.
Many teachers and others responsible for local test development have probably
long avoided taking these essential steps, though, because of unfamiliarity with
statistics or a lack of access to statistical analysis software. As has just been
shown, however, the statistics required are not actually that complex, do not
require a high level of mathematical sophistication, and can be calculated easily
using one of the most ubiquitous computer programs in the world.

ACKNOWLEDGMENT

I thank two anonymous reviewers for the feedback and encouragement that they
offered on a previous draft of this article. Any remaining shortcomings are, of
course, my own responsibility.

REFERENCES

Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.
Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge, UK: Cambridge
University Press.
Bachman, L. F., & Kunnan, A. J. (2005). Workbook and CD for statistical analyses for language
assessment. Cambridge, UK: Cambridge University Press.
Brown, J. D. (2005). Testing in language programs: A comprehensive guide to English language
assessment (2nd ed.). New York: McGraw-Hill.
Educational Testing Service. (2005). TOEFL test and score data summary: 2004–2005 test year data
(Report No. TOEFL-SUM-0405-DATA). Princeton, NJ: Author. Retrieved December 17, 2006,
from http://www.ets.org/Media/Tests/TOEFL/pdf/Test%20and%20Score%20Data%20Summary
%2004_05.pdf
Gorard, S. (2004, September). Revisiting a 90-year-old-debate: The advantages of the mean devia-
tion. Paper presented at the British Educational Research Association Annual Conference, Univer-
sity of Manchester, England. Retrieved December 20, 2006, from http://www.leeds.ac.uk/educol/
documents/00003759.htm
Guilford, J. P., & Fruchter, B. (1978). Fundamental statistics in psychology and education (6th ed.).
New York: McGraw-Hill.
International Language Testing Association. (2000). Code of ethics. Retrieved December 17, 2006,
from http://www.iltaonline.com/code.pdf

You might also like