Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

Boxplot

https://www.statology.org/boxplots/
A boxplot, sometimes called a box-and-whisker plot, is a plot that visualizes the five-number summary of a
dataset. The box part is constructed based on the quartiles, and the whiskers are the lines that represent the
distance from quartiles to minimum and maxiumum values, except for the outliers. The top whisker represents
the max, the top of the box represents the 3rd quartile, the middle line in the box represents the median, the tiny
“x” in the box represents the average, the bottom of the box represents the 1st quartile, and the bottom whisker
represents the minimum value:

Visualizing a Five Number Summary Using a


Boxplot

One of the easiest ways to visualize a five number summary is by creating a boxplot, sometimes called a box-
and-whisker plot, which uses a box with a line in the middle along with “whiskers” that extend on each end.

A box plot provides a pictorial representation of the following statistics: maximum, 75th percentile, median
(50th percentile), mean, 25th percentile and minimum. Box plots are especially useful when comparing samples
and testing whether data is distributed symmetrically.
Using Boxplots to Graph the Interquartile Range

Boxplots are a great way to visualize interquartile ranges and their relation to the median and the overall
distribution. These graphs display ranges of values based on quartiles and show asterisks for outliers that fall
outside the whiskers. Boxplots work by splitting your data into quarters.
The box in the boxplot is interquartile range! It contains 50% of data. By comparing the size of these boxes, you
can understand your data’s variability. More dispersed distributions have wider boxes.
Additionally, find where the median line falls within each interquartile box. If the median is closer to one side or
the other of the box, it’s a skewed distribution. When the median is near the center of the interquartile range,
distribution is symmetric

.
For example, in the boxplot below, method 3 has the highest variability in scores and is left-skewed. Conversely,
method 2 has a tighter distribution that is symmetrical, although it also has an outlier—read the next section for
more about that!
Related post: Boxplots versus Individual Value Plots

You can perform the following steps to create a boxplot in Excel:

Step 1: Highlight the data values.


Step 2: In the Insert tab in the Charts group along the top ribbon, click the tiny arrow in the bottom left corner to
“See All Charts.”

Step 3: Select “Box & Whisker” and click OK.


A box and whisker plot will automatically be displayed.
The top whisker represents the max, the top of the box represents the 3rd quartile, the middle line in the box
represents the median, the tiny “x” in the box represents the average, the bottom of the box represents the 1st
quartile, and the bottom whisker represents the minimum value:
You can change the background colour and the chart title as well to make it more aesthetically pleasing:

Real Statistics Data Analysis Tool

Example 1

A market research company asks 30 people to evaluate three brands of tablet computers using a questionnaire.
The 30 people are divided at random into 3 groups of 10 people each, where the first group evaluates Brand A,
the second evaluates Brand B and the third evaluates Brand C. Figure 1 summarizes the questionnaire scores
from these groups.

To generate the box plots for these three groups, press Ctrl-m and select the Descriptive Statistics and
Normality data analysis tool. A dialog box will now appear as shown in Figure 4 of Descriptive
Statistics Tools. Select the Box Plot option and insert A3:C13 in the Input Range. Check Headings
included with the data and uncheck Use exclusive version of quartile.
The resulting chart is shown in Figure 2.

Figure 2 – Box Plot

Box Plot Output

Note too that the data analysis tool also generates a table, which may be
located behind the chart. For those who are interested, this table
contains the information in Figure 3, as explained further in Special
Charting Capabilities.

For each sample, the box plot consists of a rectangular box with one line
extending upward and another extending downward (usually called
whiskers). The box itself is divided into two parts. In particular, the
meaning of each element in the box plot is described in Figure 3.

Element Meaning
Top of upper whisker Maximum value of the sample
Top of box 75th percentile of the sample
Line through the box Median of the sample
Bottom of the box 25th percentile of the sample
Bottom of the lower whisker Minimum of the sample
× markers Mean of the sample
Figure 3 – Box Plot elements

There are two versions of this table, depending on whether or not you
check or uncheck the. Use exclusive version of quartile field. If
checked then the QUARTILE.EXC version of the 25 th and 75th percentile
is used (or QUARTILE_EXC for Excel 2007 users), while if this field is
unchecked then the QUARTILE.INC (or equivalently the QUARTILE)
version is used. See Ranking Functions in Excel for more details about
the difference between these two versions.

From the box plot in Figure 2, we can see that the scores for Brand C
tend to be higher than for the other brands and those for Brand B tend to
be lower. We also see that the distribution of Brand A is pretty
symmetric at least in the range between the 1 st and 3rd quartiles, although
there is some asymmetry for higher values (or potentially there is an
outlier). Brands B and C look less symmetric. Because of the long upper
whisker (especially with respect to the box), Brand B may have an outlier
(see Outliers and Robustness for a discussion of outliers).

Another indication of symmetry is whether the × marker for the mean


coincides with the median.

Alternative Representation

We can also convert the box plot to a horizontal representation of the


data (as shown in Figure 4) by first deleting the markers for the means
(by clicking on any of these markers and pressing the backspace key) and
then clicking on the chart and selecting Insert > Charts|Bar >
Stacked Bar.

Figure 4 – Horizontal Box Plot


Box Plot with Negative Data Values

When a data set has one or more negative values, the y-axis will be
shifted upward by the amount of -MIN(R1). Here, R1 is the data range
containing the data. Thus if R1 ranges from -10 to 20, the range in the
chart will range from 0 to 30.

Example 2: Create the box plot for the data in Figure 5.9.1 where cell
B11 is changed to -300 and the exclusive version of the quartile function.

The procedure is the same as for Example 1, except that this time we
check the Use exclusive version of quartile option. The output is
shown in Figure 5.

The key difference is that since the smallest data value is -300 (the value
in cell F13), all the box plot values are shifted up by 300. This is evident
by noting that the lower tail for Brand B is at 0 instead of -300 (and that
cell G6 contains 0 instead of -300).

Figure 5 – Box plot for negative data

Note that two y-axes are displayed. The one on left is based on the
displacement of 300 units, while the one on the right shows the correct
units.

Removing one y-axis

You can remove the y-axis on the left by following the following steps:
1. Select the y-axis on the left and then right-click.
2. Choose the Format Axis… option from the menu that
appears.
3. When the menu of options appears as shown in Figure 6,
change the Label Position option from Next to
Axis to None.

Figure 6 – Remove left y-axis

Note that if you change any of the data elements, the box chart will still
be correct, although the right y-axis will not change and will still reflect
the original data, and so you will need to rely on the left y-axis (you can
remove the right y-axis as described above for the left y-axis).

More Information about Box Plots

See Box Plots with Outliers to see how to generate box plots in Excel
which also explicitly show outliers. The following two versions are
described:
An Excel charting capability that is available for versions of

Excel starting with Excel 2016
 An extended version of the Real Statistics data analysis tool
described above. This tool is available even for versions of
Excel prior to Excel 2016.
See Special Charting Capabilities for how to create a box plot manually,
using only Excel charting capabilities.

Box Plots with Outliers


Excel Box and Whiskers Chart

Starting with Excel 2016 Microsoft added a Box and Whiskers chart capability. To create a box plot, highlight the
data range A2:C11 and select Insert > Insert Statistic Chart > Box and
Whisker. The boxplot will appear:

You can add a legend as well as chart and axis titles. The box part of the chart is as described above. The mean
is shown as an ×. The whiskers extend up from the top of the box to the largest value that is less than or equal to
1.5 times the interquartile range (IQR) and down from the bottom of the box to the smallest value that is larger
than 1.5 times the IQR. Values outside this range are considered to be outliers and are represented by dots.

Figure 2 – Formulas for the Box Plot

The boundaries of the box and whiskers are calculated using the formulas below:
The only outlier is 1850 for Brand B, which is higher than the upper whisker, and so is shown as a dot.

Note that we could also use the array formula

=MAX(IF(C2:C11<=H7,C2:C11,MIN(C2:C11)))

to calculate the value in cell H9, and the array formula

=MIN(IF(C2:C11>=H8,C2:C11,MAX(C2:C11)))

to calculate a value for cell H10. In fact, since the Excel Box Plot is only
available in Excel 2016, we can also use the Excel 2016 (non-array)
formulas =MAXIFS(C2:C11,”<=”&H7) and =MINIFS(C2:C11,”>=”&H8).

Real Statistics Data Analysis Tool

The Real Statistics Resource Pack also provides a way of generating box
plots with outliers. To produce such a box plot, proceed as in Example 1
of Creating Box Plots in Excel, except that this time you should select
the Box Plots with Outliers option of the Descriptive Statistics
and Normality data analysis tool. The output for Example 1 of Creating
Box Plots in Excel is shown in Figure 3.
Figure 3 – Output from Box Plots with Outliers tool

As you can see, the output is similar to that shown in Figure 1, except
that this version is available in other releases of Excel prior to Excel
2016. Also, the Outlier Multiplier is not fixed at 1.5 but can be set to
another value by the user (in the dialog box for the Descriptive
Statistics and Normality data analysis tool).

Data Analysis Tools Details

The Outlier Multiplier is shown in cell F2 of the output displayed in


Figure 3. This value is used in calculating the Min and Max values (which
are the values at the bottom of the lower whisker and the top of the
upper whisker). E.g. cell F12 contains the array formula

=MIN(IF(ISBLANK(A4:A13),””,IF(A4:A13>=F13-$F2*(F15-
F13),A4:A13,””)))

and cell F16 contains the formula

=MAX(IF(ISBLANK(A4:A13),””,IF(A4:A13<=F15+$F2*(F15-
F13),A4:A13,””))).

If the Percentage option is set on the Configuration dialog box, then


you should enter a value 100 times the desired value in the Outlier
Multiplier field; e.g. enter 150 if you want a 1.5 outlier multiplier factor.
Note too that if you leave this field blank, the outlier multiplier factor
defaults to 2.2.

Negative numbers are handled in a manner similar to that for Box Plots
without Outliers (often using a second y-axis). Keep in mind, though,
that a second y-axis is only employed when the lower whisker of at least
one of the box plots is negative. If some outlier is negative but none of
the lower whiskers are negative, then a second y-axis is not needed.

See Creating Box Plots with Outliers in Excel for how to create a box plot
with outliers manually, using only Excel charting capabilities. Issues that
arise when some of the data is negative are also explored in a little more
depth there.

You might also like