Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

LETTER VALUE

The letter value summary was introduced by John Tukey and extends the boxplot’s 5
number summary by exploring the symmetry of the batch for depth levels other than the half
(median) or the fourth (quartiles).

Constructing the letter value summaries


Let’s start with a simple batch of numbers: 22, 8, 11, 3, 26, 1, 14, 18, 20, 25, 24.

Order the values

First, we order the numbers from lowest to highest.

Find the median (M)


Next, we find the median. It’s the location in the enumerated values that splits the batch into
two equal sets of values. If the number of values is odd, then the median is the value furthest
from the ends. If the number of values in the batch is even, then the median is the average of
the two middle values, each furthest from its end. A simple formula to identify the element
number (or depth) of the batch associated with the median is:

𝑑𝑒𝑝𝑡ℎ 𝑜𝑓 𝑚𝑒𝑑𝑖𝑎𝑛 = 𝑛 + 12

where n is the number of values in the batch. In our example, we have 11 values, so the the
median is (11 + 1)/2 or 6; it’s the 6th element from the left (or the right) of the sorted values.

If our batch consisted of an even number of values such as 10, the the median would be the
5.5th value which does not coincide with an existing value. This would require that we find
the 5th and 6th element in the batch, then compute their average to find the median.

The median is the value furthest from the extreme values; it’s said to have the greatest depth
(e.g. a depth of 6 in our example). The minimum and maximum values have the lowest depth
with a depth value of 1, each.
Find the hinges (H)

Next, we take both halves of our batch of ordered numbers and find the middle of each.
These mid points are referred to as hinges. They can be easily computed by modifying the
formula used to find the median: we simply substitute the value n with the depth associated
with the median, d(M) (i.e. the median becomes an extreme value in this operation).

𝑑(𝑀) + 1
𝑑𝑒𝑝𝑡ℎ 𝑜𝑓 𝑒𝑖𝑔ℎ𝑡𝑠 =
2
In our working example, the depth of the median is 6, therefore the depth of the hinge is
(6+1)/2 = 3.5. So the hinge is the 3.5th element from the left (or right) of the first half of the
batch and the 3.5th element from the left (or right) of the second half of the batch. Since the
depth does not fall on an existing value, we need to compute it using the two closes values
(depth 3 and depth 4). This gives us (8+11)/2=9.5 for the left hinge and (22+24)/2=23 for the
right hinge.

If our batch consisted of even number of values, we would need to drop the ½ fraction from
depth of the median before computing the depth of the hinge. For example, if we had 10
values the depth of the median would be 5.5 and the depth of the hinge would be calculated
as (5+1)/2.

Note that the hinges are similar to the quartiles but because they are computed differently,
their values may be slightly different.

Find the other letter summaries (E, D, C, B, A, etc…)

So far, we’ve found the median (M) and the hinges (H). We keep computing the depths for
each outer group of values delimited by the outer extreme values and the previous depth. For
example, the mid-point of the outer quarters gives us our eights (E):

𝑑 (𝐻 ) + 1
𝑑𝑒𝑝𝑡ℎ 𝑜𝑓 𝑒𝑖𝑔ℎ𝑡𝑠 =
2
or, after dropping the ½ fraction from the depth of the hinge, (3+1)/2=2.

This continues until we’ve exhausted all depths (i.e. until we reach a depth of 1 associated
with the minimum and maximum values). Once past the eight, we label each subsequent
depths using letters in reverse lexicographic order starting with D (for sixteenth) then C, B,
A, Z, Y, etc…
In our working example, we stop at a depth of D (though some will stop at a depth of two and
only report the extreme values thereafter).

The mids and spreads

Once we’ve identified the values associated with each depth, we compute the middle value
for each depth pair. For example, the middle value for the paired hinges is 16.25; the middle
value for the paired eights is 14; and so on. We can also compute the spread for each depth
by computing the difference between each paired value.

The letter value summary is usually reported in tabular form:

Letter Depth Lower Mid Upper Spread

M 6.0 18.0 18.00 18.0 0.0

H 3.5 9.5 16.25 23.0 13.5

E 2.0 3.0 14.00 25.0 22.0

D 1.5 2.0 13.75 25.5 23.5

C 1.0 1.0 13.50 26.0 25.0

You might also like