CH 03

Applied Business Statistics, 7th ed.
by Ken Black
Chapter 3
Descriptive
Statistics
Copyright2011
2011John
JohnWiley
Wiley&&Sons,
Sons,Inc.
Inc. 1
Copyright
Statistics and Analytics
Statistics
 The study of the collection, organization, analysis, interpretation, and presentation of data
Analytics
 The discovery and communication of meaningful patterns in data.
 Especially valuable in areas rich with recorded information
 Analytics relies on the simultaneous application of statistics, computer programming and
operations research to quantify performance.
Copyright 2011 John Wiley & Sons, Inc. 2

Mean, Mode, Median?

Learning Objectives
Distinguish between measures of central tendency,

measures of variability, measures of shape, and
measures of association.
Understand the meanings of mean, median, mode,
quartile, percentile, and range.
Compute mean, median, mode, percentile, quartile,
range, variance, standard deviation, and mean
absolute deviation on ungrouped data.
Differentiate between sample and population
variance and standard deviation.

Learning Objectives -- Continued
Understand the meaning of standard deviation as it is

applied by using the empirical rule and Chebyshev’s
theorem.
Compute the mean, median, standard deviation, and
variance on grouped data.
Understand box and whisker plots, skewness, and
kurtosis.
Compute a coefficient of correlation and interpret it.

Measures of Central Tendency:
Ungrouped Data
Measures of central tendency yield information
about “particular places or locations in a group of
numbers.”
Common Measures of Location
Mode
Median
Mean
Percentiles
Quartiles

Mode
Mode - the most frequently occurring value in a data set
Applicable to all levels of data measurement (nominal,
ordinal, interval, and ratio)
Can be used to determine what categories occur most
frequently
Sometimes, no mode exists (no duplicates)
Bimodal – In a tie for the most frequently occurring
value, two modes are listed
Multimodal -- Data sets that contain more than two
modes

Median
Median - middle value in an ordered array of numbers.

Half the data are above it, half the data are below it
Mathematically, it’s the (n+1)/2 th ordered observation
For an array with an odd number of terms, the median is the middle number
n=11 => (n+1)/2 th = 12/2 th = 6th ordered observation
For an array with an even number of terms the median is the average of the middle two
numbers
n=10 => (n+1)/2 th = 11/2 th = 5.5th = average of 5th and 6th ordered observation
The median is unaffected by the extreme values. Used for measuring salaries, age, etc.

Arithmetic Mean
Mean is the average of a group of numbers

Applicable for interval and ratio data
Not applicable for nominal or ordinal data
Affected by each value in the data set, including
extreme values
Computed by summing all values in the data set and
dividing the sum by the number of values in the data
set

Size of shoes in the MBA class
Collect a sample of students shoe sizes:

8, 9, 9, 8, 10, 11, 10, 9, 8, 9, 10, 11, 11, 10, 10, 10, 10, 10, 10, 11, 11, 12, 11, 10, 9,
7, 8, 7, 7
Determine
Mean:
Median:
Mode:
What do you think is the right measure of size of shoes for the class?

Demonstration Problem 3.1
The number of U.S. cars in service by top car rental companies

in a recent year according to Auto Rental News follows.
Company Number of Cars in Service

Enterprise 643,000; Hertz 327,000; National/Alamo 233,000;
Avis 204,000; Dollar/Thrifty 167,000; Budget 144,000;
Advantage 20,000; U-Save 12,000; Payless 10,000; ACE 9,000;
Fox 9,000; Rent-A-Wreck 7,000; Triangle 6,000
Compute the mode, the median, and the mean.

Solutions
Mode: 9,000 (two companies with 9,000 cars in service)
Median: With 13 different companies in this group, N = 13.

The median is located at the (13 +1)/2 = 7th position.
Because the data are already ordered, median is the 7th
term, which is 20,000.
Mean: μ = ∑x/N = (1,791,000/13) = 137,769.23

Property Prices in Mumbai
Collect a sample of property prices (INR per Sq. Ft.) in Mumbai:

5000, 3000, 6000, 4000, 5500, 13000, 6500, 7000, 8000, 20000, 9000, 27000, 10000, 27000,
28000
Determine
Mean:
Median:
Mode:
What do you think is the right measure of central tendency to depict the true picture of
property prices in Mumbai?

Percentiles
Percentile - measures of central tendency that divide

a group of data into 100 parts
At least n% of the data lie at or below the nth
percentile, and at most (100 - n)% of the data lie
above the nth percentile
Example: 90th percentile indicates that at least 90%
of the data are equal to or less than it, and 10% of
the data lie above it

Calculating Percentiles
To calculate the pth percentile,

Order the data
Calculate i = N (p/100)
N is the total number of observations
Determine the percentile
If i is a whole number, then use the average of the ith and (i+1)th ordered observation
Otherwise, round i up to the next highest whole number

Quartiles
Quartile - measures of central tendency that divide a

group of data into four subgroups
Q1: 25% of the data set is below the first quartile
Q2: 50% of the data set is below the second quartile
Q3: 75% of the data set is below the third quartile
Q1 Q2 Q3
25% 25% 25% 25%

Quartiles for Demonstration Problem 3.1
For the cars in service data, n=13, so
Q1: i = 13 (25/100) = 3.25, so use the 4th ordered observation

Q1 = 9,000
Q3: i = 13 (75/100) = 9.75, so use the 10th ordered observation

Q3 = 204,000

Which Measure Do I Use?
Which measure of central tendency is most appropriate?

In general, the mean is preferred, since it has nice mathematical properties (in particular, see chapter 7)
The median and quartiles, are resistant to outliers
Consider the following three datasets
1, 2, 3 (median=2, mean=2)
1, 2, 6 (median=2, mean=3)
1, 2, 30 (median=2, mean=11)
All have median=2, but the mean is sensitive to the outliers
In general, if there are outliers, the median is preferred to the mean

Measures of Spread or Dispersion:
Ungrouped Data
Common Measures of Variability
Range
Inter-quartile Range
Mean Absolute Deviation
Variance and Standard Deviation
Coefficient of Variation

Range
The difference between the largest and the smallest

values in a set of data
Advantage – easy to compute
Disadvantage – is affected by extreme values

Interquartile Range
Interquartile Range - range of values between the first and third quartiles
Range of the “middle half”; middle 50%
Useful when researchers are interested in the middle 50%, and not the extremes
Example: For the cars in service data, the IQR is 204,000 – 9,000 = 195,000
For example: salary IQR is 10LPA..we understand this?
Interquartile Range  Q3 Q1

Deviations from the mean
Useful for interval or ratio level data

An examination of deviation from the mean can reveal information about the variability of the
data
Deviations are used mostly as a tool to compute other measures of variability
However, the sum of deviations from the arithmetic mean is always zero:
Sum (X - µ) = 0
There are two ways to solve this conundrum…

Mean Absolute Deviation (MAD)
X X- |X-
5 -8 8
9 -4 4 MAD 
 X 

24
 8.4
16 3 3 n 5
17 4 4
18 5 5

Sample Variance
Another solution is to take the Sum of Squared
Deviations (SSD) about the mean
Sample Variance - average of the squared deviations
from the arithmetic mean
Sample Variance – denoted by s2
X X-Xbar (X-Xbar)2
2,398 625 390,625
1,844 71 5,041
1,539 -234 54,756
1,311 -462 213,444
  X   2
663,886
s 
2
  221,289
n 1 3
Sample Standard Deviation
Sample standard deviation is the square root of the sample variance

Same units as original data
s  s 2  221,289  470.4

Solution
The researcher computes the mean absolute
deviation, the variance, and the standard deviation
for these data in the following manner.
X X-Xbar |X-Xbar| (X-Xbar)2
55 -41 41 1,681 MAD  154 / 5  30.8
100 4 4 16
s 2  5,770 / 4  1,443
125 29 29 841
140 44 44 1,936 s  1,443  38
60 -36 36 1,296
SUM: 480 0 154 5,770

Z Scores
Z score – represents the number of Std Dev a value

(x) is above or below the mean of a set of numbers
Z score allows translation of a value’s raw distance
from the mean into units of std dev
Z = (x-µ)/σ

Coefficient of Variation (CV) – measures the volatility of a value

(perhaps a stock portfolio), relative to its mean. It’s the ratio of
the standard deviation to the mean, expressed as a percentage
Useful when comparing Std Dev computed from data with
different means
Measurement of relative dispersion
 Example:
C .V .  100  A SD with 10 on a mean of 20
 A SD with 10 on a mean of 1000

Consider two different populations

1  29  2  84
 1  4.6  2  10
1 2
C.V .1  100 C.V .2  100
1 2
4.6 10
 100  100 
29 84
Which
 15population
.86 is more variable?
 11 .90
Since 15.86 > 11.90, the first population is more variable,

relative to its mean, than the second population

Consider two different populations

1  29  2  84
 1  4.6  2  10
1 2
C.V .1  100 C.V .2  100
1 2
4.6 10
 100  100
29 84
 15.86  11 .90
Since 15.86 > 11.90, the first population is more variable,

relative to its mean, than the second population

Calculation of Grouped Mean
Sometimes data are already grouped, and you are interested in
calculating summary statistics
Interval Frequency (f) Midpoint (M) f*M

20-under 30 6 25 150
30-under 40 18 35 630
40-under 50 11 45 495
50-under 60 11 55 605
60-under 70 3 65 195
70-under 80 1 75 75
50 2150
  f * M 2150
  43.0
f 50

Median of Grouped Data - Example
Cumulative N
 cfp
Class Interval Frequency Frequency
Md  L  2 W 
20-under 30 6 6 fmed
30-under 40 18 24 50
40-under 50 11 35  24
50-under 60 11 46  40  2 10
11
60-under 70 3 49
 40.909
70-under 80 1 50
N = 50 Steps:
1. Cumm Freq.
2. Find N/2
3. Find higher class
4. L: Lower limit of the class
5. Cf: Cumm Freq
6. W: Class width
7. F-med: freq of median class
Mode of Grouped Data
Midpoint of the modal class

Modal class has the greatest frequency
Class Interval Frequency 30  40

Mode   35
20-under 30 6 2
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1

Variance and Standard Deviation
of Grouped Data
M  X 
2
2  f
S 
n 1
2
S  S

Population Variance and Standard
Deviation of Grouped Data
M fM M   M     M  
2
f 2
f
Class Interval
20-under 30 25 150
6 -18 324 1944
30-under 40 35 630
18 -8 64 1152
40-under 50 45 495
11 2 4 44
Can we 605
determine
12
the standard deviation, consid
1584
50-under 60 55 11 144
65 195
3
this
22 as a population
484 1452
60-under 70
70-under 80 75 175 32 1024 1024
2150
50 7200
2
f M  
   2  144  12
2  7200
    144
N 50

Population Variance and Standard
Deviation of Grouped Data
M fM M   M     M  
2
f 2
f
Class Interval
20-under 30 6 25 150 -18 324 1944
30-under 40 18 35 630 -8 64 1152
40-under 50 11 45 495 2 4 44
50-under 60 11 55 605 12 144 1584
60-under 70 3 65 195 22 484 1452
70-under 80 1 75 75 32 1024 1024
50 2150 7200
2
M  
   2  144  12
f 7200
2   144
N 50

Measures of Shape
Symmetrical – the right half is a mirror image of the

left half
Skewed – shows that the distribution lacks
symmetry; used to denote the data is sparse at one
end, and piled at the other end
Absence of symmetry
Extreme values or “tail” in one side of a distribution
Positively- or right-skewed vs. negatively- or left-skewed
0.15
0.15
0.10
0.10
y
y
0.05
0.05
0.00
0.00
0 5 10 15 20 0 5 10 15 20
x x

Coefficient of Skewness
Coefficient of Skewness (Sk) - compares the mean

and median in light of the magnitude to the standard
deviation; Md is the median; Sk is coefficient of
skewness; σ is the Std Dev
3  Md 
Sk 

Coefficient of Skewness
Summary measure for skewness
3  Md 
Sk 

If Sk < 0, the distribution is negatively skewed (skewed to
the left).
If Sk = 0, the distribution is symmetric (not skewed). If Sk is
close to 0, it’s almost symmetric
If Sk > 0, the distribution is positively skewed (skewed to
the right).

The effectiveness of district attorneys can be measured

by several variables, including:
1.The number of convictions per month
2.The number of cases handled per month
3.The total number of years of conviction per month.
A researcher uses a sample of five district attorneys in
a city and determines the total number of years of
conviction that each attorney won against defendants
during the past month, as reported in the first column
in the following tabulations. Compute the mean
absolute deviation, the variance, and the standard
deviation for these figures.

CH 03

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH 03

Uploaded by

Copyright:

Available Formats

Applied Business Statistics, 7th ed.

Copyright 2011 John Wiley & Sons, Inc. 2

Copyright 2011 John Wiley & Sons, Inc. 3

Distinguish between measures of central tendency,

Copyright 2011 John Wiley & Sons, Inc. 4

Understand the meaning of standard deviation as it is

Copyright 2011 John Wiley & Sons, Inc. 5

Copyright 2011 John Wiley & Sons, Inc. 6

Copyright 2011 John Wiley & Sons, Inc. 7

Median - middle value in an ordered array of numbers.

Copyright 2011 John Wiley & Sons, Inc. 8

Mean is the average of a group of numbers

Copyright 2011 John Wiley & Sons, Inc. 9

Collect a sample of students shoe sizes:

Copyright 2011 John Wiley & Sons, Inc. 10

The number of U.S. cars in service by top car rental companies

Company Number of Cars in Service

Compute the mode, the median, and the mean.

Copyright 2011 John Wiley & Sons, Inc. 11

Mode: 9,000 (two companies with 9,000 cars in service)

Median: With 13 different companies in this group, N = 13.

Mean: μ = ∑x/N = (1,791,000/13) = 137,769.23

Copyright 2011 John Wiley & Sons, Inc. 12

Collect a sample of property prices (INR per Sq. Ft.) in Mumbai:

Copyright 2011 John Wiley & Sons, Inc. 13

Percentile - measures of central tendency that divide

Copyright 2011 John Wiley & Sons, Inc. 14

To calculate the pth percentile,

Copyright 2011 John Wiley & Sons, Inc. 15

Quartile - measures of central tendency that divide a

25% 25% 25% 25%

Copyright 2011 John Wiley & Sons, Inc. 16

For the cars in service data, n=13, so

Q1: i = 13 (25/100) = 3.25, so use the 4th ordered observation

Q3: i = 13 (75/100) = 9.75, so use the 10th ordered observation

Copyright 2011 John Wiley & Sons, Inc. 17

Which measure of central tendency is most appropriate?

Copyright 2011 John Wiley & Sons, Inc. 18

Copyright 2011 John Wiley & Sons, Inc. 19

The difference between the largest and the smallest

Copyright 2011 John Wiley & Sons, Inc. 20

Interquartile Range  Q3 Q1

Copyright 2011 John Wiley & Sons, Inc. 21

Useful for interval or ratio level data

Copyright 2011 John Wiley & Sons, Inc. 22

Copyright 2011 John Wiley & Sons, Inc. 23

Sample standard deviation is the square root of the sample variance

Copyright 2011 John Wiley & Sons, Inc. 25

Copyright 2011 John Wiley & Sons, Inc. 26

Z score – represents the number of Std Dev a value

Copyright 2011 John Wiley & Sons, Inc. 27

Coefficient of Variation (CV) – measures the volatility of a value

Copyright 2011 John Wiley & Sons, Inc. 28

Consider two different populations

Since 15.86 > 11.90, the first population is more variable,

Copyright 2011 John Wiley & Sons, Inc. 29

Consider two different populations

Since 15.86 > 11.90, the first population is more variable,

Copyright 2011 John Wiley & Sons, Inc. 30

Interval Frequency (f) Midpoint (M) f*M

Copyright 2011 John Wiley & Sons, Inc. 31

Midpoint of the modal class

Class Interval Frequency 30  40