Professional Documents
Culture Documents
Chapter - 02 - Describing Data With Numerical Measures
Chapter - 02 - Describing Data With Numerical Measures
and Statistics
Fourteenth Edition
Chapter 2
Describing Data
with Numerical Measures
Some graphic screen captures from Seeing Statistics ® Copyright ©2006 Brooks/Cole
Some images © 2001-(current year) www.arttoday.com A division of Thomson Learning, Inc.
Describing Data with Numerical
Measures
• Graphs can help you describe the basic shape of a data
distribution; “a picture is worth a thousand words.”
• Graphical methods may not always be sufficient for
describing data.
• Numerical measures can be created for both
populations and samples.
– A parameter is a numerical descriptive measure
calculated for a population.
population
– A statistic is a numerical descriptive measure
calculated for a sample.
sample
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Measures of Center
• Dotplots, stem and leaf plots, and histograms are
used to describe the distribution of a set of
measurements on a quantitative variable.
• Measure of center - a measure along the
horizontal axis of the data distribution that locates
the center of the distribution.
xi 1 n x
x x xi
n i 1
N
n
where n = number of measurements
xi sum of all the measuremen ts
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Trimmed Mean
• Weighted arithmetic mean: n
w x i i
x i 1
n
• Trimmed mean:
w
i 1
i
xi 2 9 11 5 6 33
x 6 .6
n 5 5
n 25 8/25
Relative frequency
• Median? 6/25
m2 4/25
mode 2
0
0 1 2 3 4 5
Quarts
positively
negatively
skewed
skewed
45
x 9
5
4 6 8 10 12 14
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
The Variance
• Definition The variance of a population of N
measurements is the average of the squared deviations
of the measurements about their mean
2
2 ( x )
i
N
• Definition The variance of a sample of n
measurements is the sum of the squared deviations
of the measurements about their mean , divided
by (n – 1) 2
2 ( xi x )
s
n 1
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
The Variance
• Flower petals: 5, 7, 1, 2, 4
x 44.9 12/50
Relative frequency
10/50
s 10.73
8/50
6/50
4/50
0
25 33 Copyright
41 ©2006
49 Brooks/Cole
57 65 73
Ages
A division of Thomson Learning, Inc.
Age Tally Frequency Relative Percent
Frequency
25 to < 33 1111 5 5/50 = .10 10%
33 to < 41 1111 1111 1111 14 14/50 = .28 28%
41 to < 49 1111 1111 111 13 13/50 = .26 26%
49 to < 57 1111 1111 9 9/50 = .18 18%
57 to < 65 1111 11 7 7/50 = .14 14%
65 to < 73 11 2 2/50 = .04 4%
•Yes. Tchebysheff’s
•Do the actual proportions in the three
Theorem must be
intervals agree with those given by true for any data
Tchebysheff’s Theorem? set.
•Do they agree with the Empirical •No. Not very well.
Rule?
•The data distribution is not very
•Why or why not? mound-shaped, but skewed right.
Actual s = 10.73
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Approximating s
• The range approximation is not intended to provide
an accurate value for s.
• Rather, its purpose is to detect gross errors in
calculating, such as the failure to divide the sum of
squares of deviations by (n - 1) or the failure to take
the square root of s2.
• For one of these mistakes, the answer will be many
times larger than the range approximation of s.
• The range approximation for s can be improved if it
is known that the sample is drawn from a mound-
shaped distribution of data. Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Approximating s
• The range for a sample of n measurements will
depend on the sample size, n. For larger values of n,
a larger range of the x values is expected.
• Divisor for the Range Approximation of s
Number of Measurements Expected Ratio of Range to s
5 2.5
10 3
25 4
>=50 6
Q1 m Q3
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Constructing a Box Plot
Isolate outliers by calculating
Lower fence: Q1-1.5 IQR
Upper fence: Q3+1.5 IQR
Measurements beyond the upper or lower fence are
outliers and are marked (*).
*
Q1 m Q3
*
Q1 m Q3
m
Q1 Q3
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Example
IQR = 340-292.5 = 47.5
Lower fence = 292.5-1.5(47.5) = 221.25
Upper fence = 340 + 1.5(47.5) = 411.25
Outlier: x = 520
*
m
Q1 Q3
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Interpreting Box Plots
Box plot can be used to describe the shape of a data
distribution by looking at the position of the median line
compared to Q1 and Q3.
Median line in center of box and whiskers of equal
length—symmetric distribution
Median line left of center and long right whisker—
skewed right
Median line right of center and long left whisker—
skewed left
For most skewed distributions, the whisker on the
skewed side of the box tends to be longer than the
whisker on the other side.
2 ( xi ) 2
2 xi
( x x ) n
s2 i
n 1 n 1
3. Standard deviation
Population standard deviation : 2
Sample standard deviation : s s 2
4. A rough approximation for s can be calculated as s R / 4.
The divisor can be adjusted depending on the sample size.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Key Concepts
III. Tchebysheff’s Theorem and the Empirical Rule
1. Use Tchebysheff’s Theorem for any data set, regardless of
its shape or size.
a. At least 1-(1/k 2 ) of the measurements lie within k
standard deviation of the mean.
b. This is only a lower bound; there may be more
measurements in the interval.
2. The Empirical Rule can be used only for relatively mound-
shaped data sets.
– Approximately 68%, 95%, and 99.7% of the measurements
are within one, two, and three standard deviations of the
mean, respectively.
Copyright ©2006 Brooks/Cole
A division of Thomson Learning, Inc.
Key Concepts
IV. Measures of Relative Standing
1. Sample z-score:
2. pth percentile; p% of the measurements are smaller, and
(100 p)% are larger.
3. Lower quartile, Q 1; position of Q 1 .25(n 1)
4. Upper quartile, Q 3 ; position of Q 3 .75(n 1)
5. Interquartile range: IQR Q 3 Q 1
V. Box Plots
1. The five-number summary:
Min Q1 Median Q3 Max
One-fourth of the measurements in the data set lie between each
of the four adjacent pairs of numbers.