Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 10

Is there a way to summarize shape,

location and spread all at the same


time?
Luckily, we have a simple pictorial representation called BOX
PLOT that can nicely summarize the main data features in
one go.

Using EXCEL Data Analysis Plus (an add-in), one can easily
generate a boxplot.

So, our focus will be on interpreting the boxplot.


Example: Monthly starting salary of
business school graduates

Based on a sample of 111 business graduates, we have generated five


boxplots corresponding to the following majors:
Accounting
Finance
Info Systems
Management
Marketing
Boxplots
Summary of the boxplots (Location)

The median for salaries is indicated by the line inside the box within
the boxplot.

>> Accounting and Info Systems graduates enjoy the highest median
salaries.
>> Management graduates are the least paid as shown by their median
salary.
Summary of the boxplots (skewness)
The Boxplot for Accounting graduates show that monthly salaries for this
major are symmetric (NOT SKEWED) because
Q3 (third quartile) and Q1 (first quartile) are almost equidistant
from Q2 (the median or second quartile)

Salary for Management is also nearly symmetric!


The boxplots of all other majors show some skewness. For example,
Finance: skewed to the LEFT (or negatively skewed) because Q1 is
farther from Q2 than Q3 is from Q2.
Info Systems and Marketing: skewed to the RIGHT (or positively skewed)
because Q3 is farther from Q2 than Q1 is from Q2.
Can we infer about mean or average
salaries from the boxplots?

Accounting: Salary data is symmetric. Therefore, median salary will be


the same as mean salaries.
Management: Salary data is near-symmetric. Therefore, median and
average salaries will be close.
Finance: Salary data is skewed to the left. Therefore, average salary will
be less than median salary.
Info Systems and Marketing: Salary data is skewed to the right.
Therefore, average salary will be greater than the median salary.
Summary of the boxplots (variability)
In a boxplot variability is indicated by the size of the box. In other
words.
if Q3 – Q1 which is simply the IQR (we have seen this earlier in
this course) is large, we conclude that the data is more variable.

Therefore, Salary for Accounting shows the most variability. The other
salaries show comparable variability.
Summary of the boxplots (Outliers)
Rule of thumb:
Lower limit = Q1 - 1.5 * IQR
Upper limit = Q3 + 1.5* IQR

Outlier: If a value is more than the Upper limit or less than the Lower limit,
it is considered an outlier.
In the boxplots, those values which are considered outliers are shown
using circled points which are outside what is known as the box plot
whiskers.
Accounting: there are 2 outliers
Finance, Management and Marketing: there is 1 outlier
Something to think about
Should outliers be always thrown out from the
analysis?

Can they be any useful?


VIDEO is available
In Course Titanium an exercise video is available.

VIDEO: Quartiles and Boxplot


Please watch this video immediately after this
section has been covered in class.

You might also like