Chapter 3 Slides #1 Shape and Location

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30


Learning Goals
Shape (Skewness, symmetry, modality)
Location (mean, median, mode)
Variability/Spread (variance, standard deviation, range, IQR)
Relative Location (z-score)
Empirical Rule and Chebyshev’s Theorem
Descriptive Statistics
The purpose of descriptive statistics is to convert data into
information using meaningful charts (graphs) and numerical

Most statistical softwares can routinely generate all types of

descriptive statistics.
In this chapter, we give some practical guides on proper
interpretation of the commonly used descriptive statistics
Relevant features of your data
To effectively use descriptive statistics tools, It is important to understand
the following three features of data (our focus will be on quantitative data):

1) Shape of the data

2) Location of the data

3) Variability/Spread of the data

Do not see these aspects as separate; they complement each other. But at
the same time, treating them one at a time help you see things better.
1) Shape of data
To understand shape of your data, display its HISTOGRAM
which is simply a graphical summary of the data.
Most statistical software can automatically generate
Although you may also create histograms in Excel, you need
to do several formatting to make it look right. So, to save
time and for brevity, we will always provide software
generated histograms when needed.
What to look for in a
When trying to understanding shape of your data, look for
these key features:

1) Symmetry
2) Skewness

3) Modality

Your data will exhibit one or more of the above features.

Data examples

The following data exhibit different shapes – as shown in

their respective histograms.
Weights of tomatoes (grown in an experimental) for a sample
Histogram left, Boxplot right
This data is symmetric.

Is also bell / mound shaped.

Many business/economic data exhibit this feature.

The data resembles what we call the Normal distribution.

CEO annual pay at 500 largest US firms in 2008
(histogram left, Boxplot - right)
This data is skewed to the right OR is positively skewed.

Quite a few salaries stretch the data to the right!

Data such as income are often skewed to the right.

Number of books shipped out daily by for
selected 100 days
Histogram left, Boxplot right
This is skewed to the left OR negatively skewed.

A few small values stretch the data to the left!

This data shape is rare in business and economic applications.

Grades of Economics Midterm
Histogram left, Boxplot right
This has two features: skewed to the left and bimodal.

We say there are two modes (is bimodal) in this data as

there are two distinct grades in the class.

Modality will tell you if there are distinct (unique) groups in

your data.
You can imagine how wrong our analysis is going to be if we
assume our data is homogenous – while the histogram
indicates otherwise.
VIDEO is available on shapes
On Titanium, an exercise video on shapes is
Please watch video immediately after this section
has been covered in class.
2) Location of data
A location measure is a numerical typical value /summary of data.

We present 3 location measures.

These measures are also called measures of central tendency, i.e.

the data tends to cluster around those measures!

Thus, may help summarize data with a single numerical value.

Location Measures
◦ Mean -- the simple average

◦ Median -- the middle observation after the data has been


>> Our focus in this course will be on mean and median

◦ Mode -- the observation that occurs most often

>> For continuous variables, the definition of mode as in above

is not meaningful.
(Even Number of Data Points)
The average of the two middle observations

First put in either ascending or descending order

DATA: 4,2,3,3,2,2,1,4,3,2

There is an even number of data points (10)

ASCENDING ORDER: 1,2,2,2,2,3,3,3,4,4 2,3

Median =
Average of the middle two
(2+3)/2 = 2.5
(Odd Number of Data Points)

Suppose we extend the previous data by one more observation which has a
value of 4.
Then, we will have odd number of data points (11)

ASCENDING ORDER: 1,2,2,2,2,3,3,3,4,4,4


Median is middle observation

Median = 3
The observation that occurs in a dataset most often

DATA: 4,2,3,3,2,2,1,4,3,2
Mode = 2

We can have data with more than mode

◦ Bimodal, multi-modal
Computing location in Excel
Suppose data are in cells A2 to A11
Mean -- = AVERAGE(A2:A11)
Median -- = MEDIAN(A2:A11)
Mode -- = MODE(A2:A11)

Can also use Descriptive Statistics Option from Data Analysis in the
Tools Menu



Where data values are stored

Enter Name of
Output Worksheet

Check both:
Summary Statistics
Confidence Level
Drag to make
Column A wider

Sample Mean
Sample Median
Sample Mode
Which location measure to
This depends on the SHAPE of your data.

When the histogram is unimodal and symmetric, all the location measures
are close to each other and use of mean is recommended due to its simplicity.

When the histogram shows a great deal of skewness , median is a better

measure than mean as it is still represents the center of the data.

Mode is used rarely and is supposed to be an indicator of groups within the

Skewness, mean and median
Mean always go in the direction of the skew!! Never forget

Therefore, when data is skewed to the right, mean is

greater than median.

When data is skewed to the left, mean is less than median.

Something to think about?
If I were to give you a summary of single family homes in Orange
County to be the following:

A) 900,000
B) 350,000

Can you tell which is more likely the mean and which is median?
VIDEO is available on
skewness, mean and median
On Titanium, an exercise video is available.
VIDEO: Mean and Median
Please watch video immediately after this section
has been covered in class.

You might also like