Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

3.

2 Numerical Descriptive Measures for a Population 113

3.17 The five years 2012 to 2016 saw volatility in the value of shares. 3.18 The annual returns (before tax and fees) on several managed
The data in the following table give the annual percentage change superannuation investment funds are:
in the share market index for Hong Kong, the Hang Seng, and for
Australia, the S&P/ASX 200, for 2012 to 2016. Historical crediting rate for
year ending 30 June %
Year 2012 2013 2014 2015 2016 Fund 2017 2016 2015 2014 2013
Hang Seng 22.9% 2.9% 1.3% - 7.2% 0.4% Conservative
ASX 200 14.6% 15.1% 1.1% - 2.1% 7.0% balanced 5.3 7.5 10.2 11.6 11.7
Balanced 9.2 6.1 11.0 13.9 15.9
Source: Data obtained from Yahoo 7 Finance <http://au.finance.yahoo.com> High growth 16.6 0.0 13.9 18.9 20.7
accessed April 2017 Sustainable
balanced 12.4 0.0 15.0 15.7 15.9
a. For each index calculate the geometric rate of return for the a. For each fund, calculate the geometric rate of return for three
five years. years (2015 to 2017) and for five years (2013 to 2017).
b. What conclusions can you reach concerning the geometric b. What conclusions can you reach concerning the geometric
rates of return of the two indices? rates of return for the funds?

3.2 NUMERICAL DESCRIPTIVE MEASURES FOR A POPULATION LEARNING OBJECTIVE 2


Section 3.1 introduces several statistics that describe the properties of central tendency, varia- Calculate and interpret
tion and shape for a sample. If we have population data there are similar numerical descriptive descriptive summary
measures, called population parameters, of central tendency, variation and shape. This section measures for a population
introduces three population parameters: population mean, population variance and population
standard deviation.
To illustrate these population parameters we use the data in Table 3.3, which classifies
road fatalities in Australia for 2016 by month and gender. Because the table gives the total,
and the male and female monthly road fatalities for 2016, for all of Australia this is popula-
tion data.

Gender Table 3.3


Month Unknown Male Female Total Road fatalities in Australia
January 0 27 80 107 2016
February 0 30 72 102 Source: Data obtained
March 0 23 87 110 from the Australian
April 0 33 81 114 Road Deaths Database
May 0 29 76 105 <www.bitre.gov.au/statistics/
safety/fatal_road_crash_
June 0 23 74   97
Copyright © 2018. Pearson Education Australia. All rights reserved.

database.aspx> accessed
July 0 26 91 117 4 May 2017.
August 0 38 74 112
September 0 24 68   92
October 0 29 89 118
November 0 28 78 106
December 1 27 88 116
Total 1 337 958 1,296

Population Mean
population mean
The population mean, defined by Equation 3.13, is represented by the symbol μ, the Greek Mean calculated from population
lower-case letter mu. data.

Copyright © Pearson Australia (a division of Pearson Australia Group Pty Ltd) 2019— 9781488617249 — Berenson/Basic Business Statistics 5e

Berenson, M., Levine, D., Szabat, K., Watson, J., Jayne, N., & OBrien, M. (2018). Basic business statistics. Pearson Education Australia.
Created from scu on 2024-05-28 06:46:30.
114 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES

POPUL AT ION M E AN
The population mean is the sum of the values in the population divided by the population
size, N.
N
© Xi
i=1
(3.13)
μ =
N
where μ = population mean
Xi = ith value of the variable X
N
© Xi = sum of all Xi values in the population
i=1

To calculate the mean monthly total road fatality for 2016 from the data given in Table 3.3,
use Equation 3.13:
N
© Xi 107 + 102 + … + 116 1296
i=1
μ= = = = 108
N 12 12
Thus, the mean monthly road fatality for 2016 was 108.

Population Variance and Standard Deviation


population variance The population variance and the population standard deviation measure variation in a popula-
Variance calculated from population tion. Like the related sample statistic, the population standard deviation is the square root of
data.
the population variance. The population variance is represented by the symbol σ2, the Greek
population standard deviation lower-case letter sigma squared, and the population standard deviation by the symbol σ.
Standard deviation calculated from These parameters are defined by Equations 3.14a and 3.15. The denominator in Equation
population data. 3.14a is N (population size) and not n − 1 as used in the equation for the sample variance
(see Equation 3.9a).

P OPUL AT ION VA R I A NC E – D E F I NI T I O N F O R M U LA
The population variance is the sum of the squared deviations from the population mean
divided by the population size N.
N

SSX
©(Xi - μ)2 (3.14a)
Copyright © 2018. Pearson Education Australia. All rights reserved.

i=1
σ2 = =
N N
where μ = population mean
Xi = ith value of the variable X
N
SSX = ©(Xi - μ)2 = sum of the squared deviations from the mean (sum of
i=1 squares)

P OPUL AT ION STA NDA R D D E V I AT I O N


The population standard deviation is the square root of the population variance.
σ = σ2 (3.15)

Copyright © Pearson Australia (a division of Pearson Australia Group Pty Ltd) 2019— 9781488617249 — Berenson/Basic Business Statistics 5e

Berenson, M., Levine, D., Szabat, K., Watson, J., Jayne, N., & OBrien, M. (2018). Basic business statistics. Pearson Education Australia.
Created from scu on 2024-05-28 06:46:30.
3.2 Numerical Descriptive Measures for a Population 115

As we did for sample variance and standard deviation, we can use algebra to obtain alter-
native calculation formulas.

PO PULAT ION VA R IA N CE – CA LCUL ATI O N F OR M UL A


The population variance is the sum of the squared deviations from the population mean
divided by the population size N.
n 2
N N © Xi
i=1

SSX
© Xi2 - Nμ2 © Xi2 - N (3.14b)
i=1 i=1
σ2 = = =
N N N
where μ = population mean
Xi = ith value of the variable X
N
© Xi2 = X12 + X22 + p + XN2 = sum of the squared Xi values in the population
i=1

Use either calculation formula.


Using the data in Table 3.3 to calculate the population variance and standard deviation for
the 2016 monthly total road fatalities, first calculate:
N
© Xi2 = 1072 + 1022 + p + 1162 = 140,696
i=1

then use Equations 3.14b and 3.15 to obtain:


N
© Xi2 - Nμ2 140,696 - 12 3 1082
i =1
σ2 = = = 60.666…
N 12

σ = σ2 = 60.666… = 7.788…

Thus, the variance of monthly total fatalities for 2016 is approximately 60.7 and the standard
deviation is approximately 7.8 fatalities per month. So, the typical 2016 monthly fatality rate
differs from the mean of 108 by plus or minus 7.8.
Copyright © 2018. Pearson Education Australia. All rights reserved.

The Empirical Rule


In many data sets a large portion of the values tend to cluster near the median. In right-skewed
data sets, this clustering occurs in the left or lower part of the distribution. In left-skewed data
sets, the values tend to cluster in the right or upper part of the distribution. In symmetrical data
sets, where the median and mean are similar, the values often cluster around the median and
mean, producing a bell-shaped distribution. You can use the empirical rule to examine the vari- bell-shaped
ability in bell-shaped distributions, both population and sample. Symmetric, unimodal, mound-
shaped distribution.
The empirical rule states that for bell-shaped distributions:
• Approximately 68% of the values are within a distance of ±1 standard deviation from the empirical rule
mean. That is, approximately 68% of the data values have Z scores between −1 and 1. Gives the distribution of data values
• Approximately 95% of the values are within a distance of ±2 standard deviations from in terms of standard deviations from
the mean for bell-shaped
the mean. That is, approximately 95% of the data values have Z scores between −2 and 2. distributions.
• Approximately 99.7% of the values are within a distance of ±3 standard deviations from
the mean. That is, approximately 99.7% of the data values have Z scores between −3 and 3.

Copyright © Pearson Australia (a division of Pearson Australia Group Pty Ltd) 2019— 9781488617249 — Berenson/Basic Business Statistics 5e

Berenson, M., Levine, D., Szabat, K., Watson, J., Jayne, N., & OBrien, M. (2018). Basic business statistics. Pearson Education Australia.
Created from scu on 2024-05-28 06:46:30.
116 CHAPTER 3 NUMERICAL DESCRIPTIVE MEASURES

The empirical rule helps to identify outliers when analysing a set of numerical data. The
empirical rule implies that, for bell-shaped distributions, only about 1 in 20 values will be
beyond two standard deviations from the mean. As a general rule, you can consider values not
found in the interval μ ± 2σ (or X ± 2S) as potential outliers. The rule also implies that only
about 3 in 1,000 will be beyond three standard deviations from the mean. Therefore, values not
found in the interval μ ± 3σ (or X ± 3S) are almost always considered outliers. For heavily
skewed or non-bell-shaped data sets the Chebyshev rule, introduced next, should be used
instead of the empirical rule.

EXAMPLE 3.14 U S IN G T H E E MP IR IC AL R U L E
A population of 600-mL bottles of soft drink is known to have a mean fill-weight of 603 mL
and a standard deviation of 1 mL. The population is also known to be bell-shaped.
Describe the distribution of fill-weights. Is it very likely that a bottle will contain less
than 600 mL of soft drink?

SOLUTION
μ ± σ = 603 ± 1 = (602, 604)
μ ± 2σ = 603 ± 2(1) = (601, 605)
μ ± 3σ = 603 ± 3(1) = (600, 606)
Using the empirical rule, approximately 68% of the bottles will contain between 602 mL and
604 mL, approximately 95% will contain between 601 mL and 605 mL, and approximately
99.7% will contain between 600 mL and 606 mL. Therefore, it is highly unlikely that a bottle
will contain less than 600 mL of soft drink. Specifically, because of the assumed symmetry,
we would expect only 0.15% of bottles to have a volume of soft drink less than 600 mL (and
thus 0.15% above 606 mL).

The Chebyshev Rule


Chebyshev rule The Chebyshev rule states that, for all data sets, population or sample, the percentage of values
Gives lower bounds of the within k standard deviations of the mean must be at least:
distribution of data values in terms
of standard deviations from the 1 2
mean for any distribution. c1 − a k b d 100%

You can use this rule for any value of k greater than 1. Consider k = 2. The Chebyshev rule
states that at least [1 − (1/2)2]100% = 75% of the values must be within ±2 standard devia-
tions of the mean.
The Chebyshev rule is very general and applies to any distribution. The rule gives the
Copyright © 2018. Pearson Education Australia. All rights reserved.

percentage of values that must at least be within a given distance from the mean. However, if
the data set is approximately bell-shaped, the empirical rule will more accurately reflect the
greater concentration of data close to the mean. Table 3.4 compares the Chebyshev and
empirical rules.

Table 3.4 % of values found in intervals around the mean


How data vary around the Chebyshev Empirical rule
mean Interval (any distribution) (bell-shaped distribution)
(μ − σ, μ + σ) At least 0% Approximately 68%
(μ − 2σ, μ + 2σ) At least 75% Approximately 95%
(μ − 3σ, μ + 3σ) At least 88.89% Approximately 99.7%

Copyright © Pearson Australia (a division of Pearson Australia Group Pty Ltd) 2019— 9781488617249 — Berenson/Basic Business Statistics 5e

Berenson, M., Levine, D., Szabat, K., Watson, J., Jayne, N., & OBrien, M. (2018). Basic business statistics. Pearson Education Australia.
Created from scu on 2024-05-28 06:46:30.
3.2 Numerical Descriptive Measures for a Population 117

USING TH E C H E BYS H E V R U LE EXAMPLE 3.15


As in Example 3.14, a population of 600-mL bottles of soft drink is known to have a mean
fill-weight of 603 mL and a standard deviation of 1 mL. However, the shape of the popula-
tion is unknown and you cannot assume that it is bell-shaped. Describe the distribution of
fill-weights. Is it very likely that a bottle will contain less than 600 mL of soft drink?

SOLUTION
μ ± σ = 603 ± 1 = (602, 604)
μ ± 2σ = 603 ± 2(1) = (601, 605)
μ ± 3σ = 603 ± 3(1) = (600, 606)
Because the distribution may not be bell-shaped, the empirical rule should not be used. Using
the Chebyshev rule, you cannot say anything about the percentage of bottles containing between
602 mL and 604 mL. You can state that at least 75% of the bottles will contain between 601 mL
and 605 mL, and at least 88.89% will contain between 600 mL and 606 mL. Therefore, it is pos-
sible that up to 11.11% of bottles contain less than 600 mL of soft drink (or more than 606 mL).

These two rules apply to both population and sample data. For sample data, use the sample
mean X and sample standard deviation S in place of the population parameters μ and σ.

Problems for Section 3.2


LEARNING THE BASICS there is only a sample of weekly sales and price data for the
3.19 The data below are for a population with N = 10: Internet sales. The data is stored in the < NATURALLY_SOAP >
file.
7 5 11 8 3 6 2 1 9 8 a. For the Sunday morning market:
a. Calculate the population mean. i. Calculate the mean, variance and standard deviation of
b. Calculate the population standard deviation. the weekly sales for the year.
3.20 The data below are for a population with N = 10: ii. What conclusions can you make about the weekly sales
7 5 6 6 6 4 8 6 9 3 for this market?
a. Calculate the population mean. iii. Use the empirical rule or the Chebyshev rule, whichever
b. Calculate the population standard deviation. is appropriate, to further explain the variation in the
weekly sales.
iv. Using the results in (iii), are there any outliers? Explain.
APPLYING THE CONCEPTS
b. Repeat (a) for the Wednesday evening market.
3.21 Analyse the road fatality data for 2016 given in 3.23 The ages, to the nearest year, of all employees at a certain fast-
< MONTHLY_FATALITY _2016 > for each gender by: food outlet are:
a. Calculating the mean, variance and standard deviation.
b. Finding the proportion of months that have fatalities within 19 19 45 20 21 21 18 20 23 17
Copyright © 2018. Pearson Education Australia. All rights reserved.

one and two standard deviations of the mean. a. Calculate the mean, variance and standard deviation.
c. Comparing your findings with what would be expected on b. Calculate the Z scores.
the basis of the empirical rule. c. Based on the results of (a) and (b), what conclusions can you
3.22 Naturally Soap is a small business, based in a coastal town, that reach about employee ages at this fast-food outlet?
makes and sells natural, luxurious, handmade soap bars in a 3.24 The file < HOURS > gives the hours worked during a recent
variety of scents. Presently the soap is sold at local markets: week by all 30 employees of a local bakery.
Wednesday evening in the coastal town where the business is For this week:
located, and a scheduled Sunday morning market in a roster of a. Calculate and interpret the mean hours worked.
local villages. During the last six months, Naturally Soap has b. Calculate the variance and standard deviation of the hours
also been available via the Internet. worked. Interpret the standard deviation.
Naturally Soap is interested in analysing the quantity sold c. Use the empirical rule or the Chebyshev rule, whichever is
weekly at each market and Internet sales. appropriate, to explain further the variation in the hours
While Naturally Soap has complete sales and price data for worked.
both markets for the previous year, due to a computer ‘problem’ d. Using the results in (c), are there any outliers? Explain.

Copyright © Pearson Australia (a division of Pearson Australia Group Pty Ltd) 2019— 9781488617249 — Berenson/Basic Business Statistics 5e

Berenson, M., Levine, D., Szabat, K., Watson, J., Jayne, N., & OBrien, M. (2018). Basic business statistics. Pearson Education Australia.
Created from scu on 2024-05-28 06:46:30.

You might also like