Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Box Plots and Distribution

A box plot , or a box-and-whisker plot, is a An engineer might use box plots to compare the
graphical way to represent data for a single performance of two potential solutions to determine
variable. A box plot uses an organized list of which one to develop further. Or, a quality-
numeric data that has been divided into assurance specialist might use a box plot to find
quartiles . These quartiles are shown as various manufactured parts that show unusual
sections along a number line.  characteristics.  

Box plots allow you to quickly get an idea of the center and variability of a set of data. Multiple box plots are
very useful for comparing the same variable from two or more sets of data. Multiple box plots are sometimes
referred to as parallel box plots.

Components of a Box Plot


A box plot is a five-number summary of a distribution. The following five values are graphically displayed on a box
plot:

Minimum, or smallest, value in the data set


Median value of the data set, which divides the number of data points in half creating a "lower half" and an "upper
half" of the data set (for a set of data with an even number of values, the median is found by taking the average of
the middle two data values)
First quartile (Q1) of the data set, which is the median of the "lower half" of the data
Third quartile (Q3) of the data set, which is the median of the "upper half" of the data
Maximum, or largest, value of the data set

The "box ends" are placed at Q1 and Q3. The boxed portion represents the middle 50 percent of the data and defines
the interquartile range . The median value is shown as a line within the interior of the box. The lines that extend
from Q1 to the minimum value and Q3 to the maximum value are called "whiskers."  
Note: The mean can also be shown on a box plot. If so, it appears as a small "x" or as a dashed line at the
location. Other variations of box plots call attention to outliers by plotting them separately by point using a
dot or star.

How to Create a Box Plot


You have designed a new component to use as part of a robotic arm assembly for your
company. This component will be printed using a 3-D printer. The hole diameter, as
designed, should measure 3.0 mm. You print ten sample components on your 3-D test
printer and measure the hole diameter for each part. Your data is below (all values in
mm):

2.9 3.0 3.1 3.0 3.1 2.8 3.0 3.2 3.1 2.9

. Order the data from smallest to largest.


. Divide the data into quartiles.

. Determine the five-number summary.


How to Interpret a Box Plot
When viewing box plots, each "section" (the minimum to Q1, Q1 to the median, the median to Q3, and Q3 to the
maximum) of the box plot represents 25 percent of the number of data values.  

Note: A box plot that is longer on one side does NOT mean that there is more data on that side. Instead, it
means that the data is more variable, or spread out, in that particular section.

If a data distribution is symmetric, its box plot will show the median roughly down the middle with an equal spread on
either side. Skewed data will show a box plot where the median divides the box into two unequal pieces. If the longer
part of the box is to the right (or above) the median, the data is said to be skewed right. If the longer part is to the left
(or below) the median, the data is skewed left.
While a box plot can provide a general overview of the symmetry of a distribution, it is important to note that the exact
shape of the distribution is unknown. Two identical box plots could have very different underlying distributions. For
our 3-D printed component example, you can conclude that the distribution of the hole diameter data is roughly
symmetric because the box plot for this data shows an even spread among the sections. However, we don't know if
the shape is uniform, bell-shaped, or u-shaped from viewing the box plot. Other displays, such as a histogram or dot
plot, would give further insight into the shape of the distribution.

Using Parallel Box Plots to Compare Data


The true advantage of box plots is their ability to compare different distributions of the same variable. For our hole
dimension data, let's compare data from three separate 3-D printers. The box plots in this example are using a vertical
scale instead of horizontal. All of the box plots are measuring the same variable: hole diameter.
Example Analysis

Let's assume that your design specifies hole diameters to be 3.0 mm.

Three different printers will be used to produce the components for the robotic arm assembly. Printer 1 has a
median value at the target diameter of 3.0 mm but has produced parts with hole diameters as much as 0.2 mm
larger than specified. Printer 2 shows the most variability out of all of the printers, and has more than half of all
hole diameters smaller than specified. This might be a problem if the connector is too large to fit inside the hole.
Printer 3 produced the most consistent components. However, they are all above the target hole diameter of 3.0
mm. A quality-control engineer might want to determine how to change the settings on Printer 3 to print
components with smaller holes because that printer has the least output variability.

Identifying Outliers
The interquartile range (IQR) can be used as part of a rule of thumb for identifying outliers in addition to serving as a
measure of spread. This rule of thumb is called the 1.5 x IQR Rule . For a set of data, an outlier is an extreme value
that differs greatly from other values in the data set. To identify an outlier, use the following two formulas to establish
an upper and lower fence . 
Any value that does not fall between the fences are flagged as outliers.
. Construct the box plot. 

a b
An Example

The roster for a professional football team included the following weights (in pounds) of players on the
offensive line. 

307 311 311 313 317 318 318 326 338 353

. Find the five-number summary for these data values.

. Calculate the IQR.

The IQR in this example refers to the range of the middle 50% of offensive linemen weights, which is 15
pounds. 

. Calculate the upper and lower fences.


Any values that are above the upper fence (348.5 pounds) or below the lower fence (288.5 pounds) are
considered outliers.

. Determine if any outliers are present in the data.

There are no values that are below the lower fence. However, there is one data value that is above the
upper fence. This value, 353 pounds, is considered an outlier. Note that the right whisker has moved back
to the next highest data value that is below the upper fence, 338 pounds. 
Whenever you find outliers in your data, it is a good practice to try and find an explanation for them. You may uncover
a simple recording error in the data, or you may uncover that something has occurred in a manufacturing process, for
example. 

You might also like