Professional Documents
Culture Documents
Chap2 Numerical Summary Measures Upload
Chap2 Numerical Summary Measures Upload
CHAPTER 2
NUMERICAL SUMMARY
MEASURES
Outline of Chapter 2
2
The mean suffers from one deficiency that makes it an inappropriate measure of center
under some circumstances: Its value can be greatly affected by the presence of even a single
outlier (unusually large or small observation).
Ø In Example 2.1, the value x2= 30.5 is obviously an outlier. Without this observation, 𝑥̅ = 15.27;
the outlier increases the mean by more than 1%. If the 30.5 observation were replaced by the
relatively large value 90.0, a really extreme outlier, then 𝑥̅ = 288.5/14 = 20.61, which is larger
than any of the other observations!
2.1 Measures of Center
à The Sample Median
8
Since n = 12 is even, the sample median is the average of the n/2 = sixth and (n/2 + 1) = seventh
values from the ordered list:
The sample mean is 𝑥̅ = Sxi = 816.1/12 = 68.01, a bit more than a full minute larger than the
median. The mean is pulled out a bit relative to the median because the sample “stretches out”
somewhat more on the upper end than on the lower end.
2.1 Measures of Center
à Trimmed Means
10
When we consider the population of all such parts, the population mean value of x is
0.30. Alternatively, 0.30 is the long-run average value of x when part after part is
monitored.
2.1 Measures of Center
à Measures of Center for Distributions
à Discrete Distributions
16
Just as µ in the discrete case is the balance point for the histogram corresponding to p(x),
in the continuous case µ is the balance point for the density curve corresponding to f (x).
2.1 Measures of Center
à Measures of Center for Distributions
à Continuous Distributions à Example 2.5
18
The latter integral is zero because the integrand g(y) is an odd function (g(-y) = -g(y)),
which gives the desired result.
2.1 Measures of Center
à Measures of Center for Distributions
à Continuous Distributions
20
• Reporting a measure of center gives only partial information about a data set or distribution.
• Different samples or distributions may have identical measures of center yet differ from one
another in other important ways.
à For example, for a normal distribution (𝜇, 𝜎) , the normal curve becomes more spread out as
the value of 𝜎 increases.
• Figure 2.2 shows dotplots of three samples with the same mean and median, yet the extent
of spread about the center is different for all three samples.
• The first sample has the largest amount of variability, the third has the smallest amount, and
the second is intermediate to the other two in this respect.
How can we prevent negative and positive deviations from counteracting one
another when they are combined?
Let x be a discrete variable with mass function p(x) and mean value
µ. Just as µ itself is a weighted average of possible x values, where the
weights come from the mass function, the variance is a weighted average
of the squared deviations (x - µ)2 for possible x values.
where the sum is over all possible x values. The standard deviation is
s, the positive square root of the variance.
2.2 Measures of Variability
à The Variance and Standard Deviation of a Discrete
Distribution à Example 2.8
33
Solution:
2.2 Measures of Variability
à The Variance and Standard Deviation of a Continuous
Distribution
37
• The median separates a data set or distribution into two equal parts,
so that 50% of the values exceed the median and 50% are smaller
than the median.
§ The lower and upper quartiles along with the median separate a data
set or distribution into four equal parts:
• 25% of all values are smaller than the lower quartile,
• 25% exceed the upper quartile, and
• 25% lie between each quartile and the median.
Because n =27 is odd, the median 𝑥$ = 7.7 is included in each half of the
data:
2.3 More Detailed Summary Quantities
à Quartiles and the Interquartile Range
à Example 2.10
48
Given data:
à We can therefore
conclude that since the
median score for
morning tests is higher
than that for afternoon
tests, on average,
morning math test
scores are higher.
Source: https://www.nagwa.com
2.3 More Detailed Summary Quantities
à Quartiles and the Interquartile Range
à Boxplots à Example
55
Any observation farther than 1.5 IQR from the closest quartile is an
outlier. An outlier is extreme if it is more than 3 IQR from the nearest
quartile, and it is mild otherwise.
2.3 More Detailed Summary Quantities
à Quartiles and the Interquartile Range
à Boxplots à Boxplots That Show Outliers
57
61
Ø The basis for construction is a comparison between quantiles of the sample data and
the corresponding quantiles of the distribution under consideration.
Ø Recall that for any number p between 0 and 1, the pth quantile p is such that area p
lies to the left of p under the density curve. For example, Appendix Table I shows
that the .9th quantile (90th percentile) for the standard normal distribution is
approximately 1.28, the .1th quantile is roughly 21.28, the .8th quantile is about .84,
and of course the .5th quantile (the median) is 0.
Let x(1) denote the smallest sample observation, x(2) the second smallest sample
observation, . . . , and x(n) the largest sample observation. We take x(1) to be the
(.5/n)th sample quantile, x(2) to be the (1.5/n)th sample quantile, . . . , and finally
x(n) to be the [(n - .5)/n]th sample quantile. That is, for i = 1, . . . , n, x(i) is the
[(i - .5)/n]th sample quantile.
Ø Thus when n520, x(1) is the .025th quantile, x(2) is the .075th quantile, x(3) is the
.125th quantile, . . . , and x(20) is the .975th quantile (97.5th percentile).
2.4 Quantile Plots
à Sample Quantiles
à A Normal Quantile Plot
63
• Suppose now that for i51, . . . , n, the quantities (I - .5)/n are calculated and the
corresponding quantiles are determined for a specified population or process
distribution whose plausibility is being investigated.
• If the sample were actually selected from the specified distribution, the sample
quantiles should be reasonably close to the corresponding distributional quantiles.
Ø In other words, pair the smallest quantile with the smallest observation, the second
smallest quantile with the second smallest observation, and so on.
Ø If the first number in each pair is close to the second number, the points in the plot
will fall close to a 45° line [one with slope 1 passing through the point (0, 0)].
Ø There is a linear relationship between z quantiles and those for any other normal
distribution:
2.4 Quantile Plots
à Sample Quantiles
à A Normal Quantile Plot
65
A normal quantile plot is a plot of the (z quantile, observation) pairs. The linear
relation between normal (µ, s) quantiles and z quantiles implies that if the sample has
come from a normal distribution with particular values of µ and s, the points in the
plot should fall close to a straight line with slope s and vertical intercept µ. Thus a plot
for which the points fall close to some straight line suggests that the assumption of a
normal population or process distribution is plausible.
• Note that if a straight line is fit to the points in the plot, the intercept and slope give
estimates of µ and s, respectively, though these will typically differ from the usual
estimates 𝑥̅ and s.
2.4 Quantile Plots
à Sample Quantiles
à A Normal Quantile Plot à Example 2.17
66
Thus there is a linear relation between the logarithm of Weibull quantiles and ln[2ln(12p)].
This suggests that we calculate ln(x(1)), . . . , ln(x(n)) and then plot the (ln[2ln(12p)], ln(x))
pairs. If the plot is reasonably straight, it is plausible that the sample has come from some
Weibull distribution.
2.4 Quantile Plots
à Sample Quantiles
à Plots for Other Distributions à Example 2.18
68