Measusres of Locations

Describing Data Using
Numerical Measures
Chapter Goals
After completing this chapter, you should be able
to:
Compute and interpret the mean, median, and mode
for a set of data
Compute the range, variance, and standard deviation
and know what these values mean
Construct and interpret a box and whiskers plot
Compute and explain the coefficient of variation and
z scores
Use numerical measures along with graphs, charts, and
tables to describe data
Chapter Topics
Measures of Center and Location
Mean, median, mode, geometric mean,
midrange
Other measures of Location

Weighted mean, percentiles, quartiles
Measures of Variation
Range, interquartile range, variance and
standard deviation, coefficient of variation
What Is Statistics?
Statistics is the science
of collecting, organizing,
analyzing, and
interpreting data in order
to make decisions.
Important Terms
Population
The collection of all responses,
measurements, or counts that are of interest.
Sample
A portion or subset of the population.
Important Terms
Parameter
A number that describes a population characteristic.
Average gross income of all
people in the United States in
2002.
Statistic
A number that describes a sample characteristic.
2002 gross income of people
from a sample of three states.
Two Branches of Statistics

Descriptive Statistics
Involves organizing, summarizing, and displaying
data.
Inferential Statistics
Involves using sample data to draw conclusions
about a population.
Summary Measures
Describing Data Numerically
Center and Location
Other Measures
of Location
Mean
Median
Mode
Weighted Mean
Variation
Range
Percentiles
Interquartile Range
Quartiles
Variance
Standard Deviation
Coefficient of
Variation
Measures of Center and

Location
Overview
Center and Location
Mean
Median
Mode
Weighted Mean
x
i1
XW
i1
wx
w
wx
Mean (Arithmetic Average)

The Mean is the arithmetic average
of data values
n = Sample
Size
n
Sample mean
x
i1
x1 x 2 x n
n
N
N = Population
Size
Population mean x
i
x1 x 2 x N
N
N
i1
Coding Method
Sample mean X =
u
A
n
Where u= x- A (change of origin)

And A = Guess value select close to central
value of the data point as far as possible
Change of origin and scale
'
u
h
= A
n
Where u = (x-A)/h
h = class width
For group data

'
Sample mean X =
fu
A
h
n
Where u = (x-A)/h
h = class width
Note that x=mid value for class interval data
= 1/2(lower limit+ upper limit)
Mean (Arithmetic Average)

The most common measure of central tendency
Mean = sum of values divided by the number of
values
Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
1 2 3 4 5 15
3
5
5
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
1 2 3 4 10 20
4
5
5
Median
Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Median = 3
In an ordered array, the median is the

middle number
If n or N is odd, the median is the middle
number
If n or N is even, the median is the average
of the two middle numbers
For group data median is calculated using the

following relationship
Median=
n / 2 cf
l
h
f
Where l= lower limit of median class

n/2= median position
cf = cumulative frequency preceding to median class
f = median class frequency
h = class width of median class
Mode
A measure of central tendency

Value that occurs most often
Not affected by extreme values
Used for either numerical or
categorical data
There may may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 5
0 1 2 3 4 5 6
No Mode
For group data mode is computed using following relation
1
Mode = l
h
1 2
Where l = lower limit of mode class
1 f1 f 0
2 f1 f 2
f1 = mode class frequency (maximum frequency
f 0 = frequency preceding to mode class
f 2 = frequency succeeding to mode class
Weighted Mean
Used when values are grouped by
frequency or relative importance
Example: Sample of
26 Repair Projects
Days to
Complete
Weighted Mean Days

to Complete:
Weight
XW
5
12
wx
w
i
(4 5) (12 6) (8 7) (2 8)
4 12 8 2
164
6.31 days
26
Review Example
Five houses on a hill by the beach
$2,000 K
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
$500 K
$300 K
$100 K
$100 K
Summary Statistics
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Sum 3,000,000
Mean: ($3,000,000/5)
= $600,000
Median: middle value of ranked
data
= $300,000
Mode: most frequent value
= $100,000
Which measure of location

is the best?
Mean is generally used,
unless extreme values
(outliers) exist
Then median is often used,
since the median is not
sensitive to extreme values.
Example: Median home prices
may be reported for a region
less sensitive to outliers
Shape of a Distribution
Describes how data is distributed
Symmetric or skewed
Left-Skewed
Symmetric
Mean < Median < Mode Mean = Median =

Mode
(Longer tail extends to left)
Right-Skewed
Mode < Median < Mean

(Longer tail extends to right)
Other Location Measures

Other Measures
of Location
Percentiles
The pth percentile in a data
Quartiles
1st quartile = 25th percentile
2nd quartile = 50th percentile

= median
3rd quartile = 75th percentile
array:
p% are less than or equal to

this value
(100 p)% are greater than
or equal to this value
(where 0 p 100)
Percentiles
The pth percentile in an ordered array of
n values is the value in ith position, where
p
i
(n 1)
100
Example: The 60th percentile in an ordered array of 19

values is the value in 12th position:
p
60
i
(n 1)
(19 1) 12
100
100
Quartiles
Quartiles split the ranked data into 4
equal groups
25% 25%
25%
25%
Q1
Q2
Q3
Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

(n = 9)
25 (9+1) = 2.5 position

100
Q1 = 25th percentile, so find the
so use the value half way between the 2nd and 3rd values,
so
Q1 = 12.5
For group data different partition values are computed

using following relation
i n / 4 cf
Qi l
h
Quartiles ( equal division)
f
i = 1,2,3
Deciles (10 equal division )
k n / 10 cf
Dk l
h
f
K = 1,2, , 9
t n / 100 cf
h
Percentile (100 equal division ) Pt l
f
t = 1,2,,99
Box and Whisker Plot

A Graphical display of data using 5number summary:
Minimum -- Q1 -- Median -- Q3 -- Maximum
Example:
25%
Minimum
25%
1st
Quartile
25%
Median
3rd
Quartile
25%
Maximum
Shape of Box and Whisker Plots

The Box and central line are centered between
the endpoints if data is symmetric around the
median
A Box and Whisker plot can be shown in either

vertical or horizontal format
Distribution Shape and

Box and Whisker Plot
Left-Skewed
Q1
Q2 Q3
Symmetric
Q1 Q2 Q3
Right-Skewed
Q1 Q2 Q3
Box-and-Whisker Plot
Example
Below is a Box-and-Whisker plot for
the following data:
Min
Q1
0 2
2
10 27
00 22 33 55
Q2
Q3
Max
27
27
Measures of Variation
Variation
Range
Interquartile
Range
Variance
Standard Deviation
Population
Variance
Population
Standard
Deviation
Sample
Variance
Sample
Standard
Deviation
Coefficient of
Variation
Variation
Measures of variation give
information on the spread or
variability of the data values.
Same center,
different variation
Range
Simplest measure of variation
Difference between the largest and
the smallest observations:
Range = xmaximum xminimum
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Disadvantages of the
Range
Ignores the way in which data are

distributed
7
10
11
Range = 12 - 7 = 5
12
10
11
12
Range = 12 - 7 = 5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Sensitive
to outliers
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Interquartile Range
Can eliminate some outlier problems by
using the interquartile range
Eliminate some high-and low-valued
observations and calculate the range from
the remaining values.
Interquartile range = 3rd quartile 1st quartile
Interquartile Range
Example:
X
minimum
Q1
25%
12
Median
(Q2)
25%
30
25%
45
Q3
maximum
25%
57
Interquartile range
= 57 30 = 27
70
Variance
Average of squared deviations of
values from the mean(individual
n
series)
2
(x i x )
Sample variance:
s2 i1
n -1
N

Population variance:
2
(x
i1
Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Has the same units as the original data
Sample standard deviation:
(Ungroup data)
s
Population standard deviation:
2
(x
x
)
i
i1
n -1
N
2
(x
)
i
i 1
For group data standard deviation is

computed by using the following
relationship
Sample standard deviation s =
=
Population standard deviation
f xx
n 1
2
2
fx ( fx)
n 1 n(n 1)
f x
N
2
2
fx ( fx )
2
N
N
Coding Method
Sample std. dev. s =
Where u = x-A
And A = guess value
Sample standard deviation s=
Where
x A
u'
h
2
2
fu
(
fu
)
n 1 n(n 1)
2
2
fu ' ( fu ' )
n 1
n(n 1)
Population std. dev.
( fu )
fu
2
N
N
Calculation Example:
Sample Standard
Deviation
Sample
Data (Xi) :
10
12
14
n=8
s
15
17
18
18
24
Mean = x = 16
(10 x ) 2 (12 x ) 2 (14 x ) 2 (24 x ) 2

n 1
(10 16) 2 (12 16) 2 (14 16) 2 (24 16) 2

8 1
126
7
4.2426
Comparing Standard
Deviations
Data A
11
12
13
14
15
16
17
18
19
20 21
Mean = 15.5
s = 3.338
20 21
Mean = 15.5
s = .9258
20 21
Mean = 15.5
s = 4.57
Data B
11
12
13
14
15
16
17
18
19
Data C
11
12
13
14
15
16
17
18
19
Coefficient of Variation
Measures relative variation

Always in percentage (%)
Shows variation relative to mean
Is used to compare two or more sets
of data measured in different units
Population

CV

100%
Sample
s
CV
x
100%
Comparing Coefficient
of Variation
Stock A:
Average price last year = $50
Standard deviation = $5
s
CVA
x
$5
100%
100% 10%
$50
Stock B:
Average price last year = $100
Standard deviation = $5
s
CVB
x
$5
100%
100% 5%
$100
Both stocks
have the same
standard
deviation, but
stock B is less
variable relative
to its price
The Empirical Rule

If the data distribution is bell-shaped,
then the interval:
contains
about 68% of the values
1
in the population or the sample
X
68%
The Empirical Rule

2 contains about 95% of the
values in
the population or
3
the
sample
contains about 99.7% of the

values
in the population or
the sample
95%
99.7%
Tchebysheffs Theorem
Regardless of how the data are
distributed, at least (1 - 1/k2) of the
values will fall within k standard
deviations of the mean
Examples:At least
(1 - 1/12) = 0% ..... k=1 ( 1)
(1 - 1/22) = 75% ........ k=2 ( 2)
(1 - 1/32) = 89% . k=3 ( 3)
within
Standardized Data Values

A standardized data value refers
to the number of standard
deviations a value is from the
mean
Standardized data values are
sometimes referred to as z-scores
Standardized Population
Values
x
z
where:
x = original data value
= population mean
= population standard deviation
z = standard score
(number of standard deviations x is from
)
Standardized Sample
Values
xx
z
s
where:
x = original data value
x = sample mean
s = sample standard deviation
z = standard score
(number of standard deviations x is from
)
Chapter Summary
Described measures of center and
location
Mean, median, mode, geometric mean,
midrange
Discussed percentiles and quartiles

Described measure of variation
Range, interquartile range, variance,
standard deviation, coefficient of variation
Created Box and Whisker Plots
Chapter Summary
Illustrated distribution shapes
Symmetric, skewed
Discussed Tchebysheffs Theorem

Calculated standardized data
values

Measusres of Locations

Uploaded by

Copyright:

Available Formats

You might also like

Measusres of Locations

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Measusres of Locations

Uploaded by

Copyright:

Available Formats

Describing Data Using

Other measures of Location

Two Branches of Statistics

Measures of Center and

Mean (Arithmetic Average)

Where u= x- A (change of origin)

For group data

Mean (Arithmetic Average)

In an ordered array, the median is the

For group data median is calculated using the

Where l= lower limit of median class

A measure of central tendency

For group data mode is computed using following relation

Weighted Mean Days

Which measure of location

Mean < Median < Mode Mean = Median =

Mode < Median < Mean

Other Location Measures

1st quartile = 25th percentile

2nd quartile = 50th percentile

3rd quartile = 75th percentile

p% are less than or equal to

Example: The 60th percentile in an ordered array of 19

Example: Find the first quartile

Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

25 (9+1) = 2.5 position

Q1 = 25th percentile, so find the

For group data different partition values are computed

Box and Whisker Plot

Shape of Box and Whisker Plots

A Box and Whisker plot can be shown in either

Distribution Shape and

Ignores the way in which data are

Population standard deviation:

For group data standard deviation is

Population std. dev.

(10 x ) 2 (12 x ) 2 (14 x ) 2 (24 x ) 2

(10 16) 2 (12 16) 2 (14 16) 2 (24 16) 2

Measures relative variation

The Empirical Rule

The Empirical Rule

contains about 99.7% of the

Standardized Data Values

Discussed percentiles and quartiles

Created Box and Whisker Plots

Discussed Tchebysheffs Theorem

You might also like