Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Slides Prepared by

JOHN S. LOUCKS
St. Edward’s University

© 2002 South-Western /Thomson Learning


Slide 1

Chapter 3
Descriptive Statistics: Numerical Methods
 Measures of Location  Measures of Relative Location
• Mean and Detecting Outliers
• Median • z-Scores
• Mode • Chebyshev’s Theorem
• Percentiles • Empirical Rule
• Quartiles • Detecting Outliers
 Measures of Variability  Exploratory Data Analysis
• Range • Box plot
• Interquartile Range  Measures of Association
• Variance Between Two Variables
• Standard Deviation  The Weighted Mean and
• Coefficient of Variation (risk Working with Grouped Data
to reward ratio)

Slide 2

1
Measures of Location

 Mean
 Median
 Mode
 Percentiles
 Quartiles

Slide 3

Example: Apartment Rents

Given below is a sample of monthly rent values ($)


for one-bedroom apartments. The data is a sample of 70
apartments in a particular city. The data are presented
in ascending order.

Slide 4

2
Mean

 The mean of a data set is the average of all the data


values.
 If the data are from a sample, the mean is denoted by
.

 If the data are from a population, the mean is


denoted by m (mu).

Slide 5

Example: Apartment Rents

 Mean

Slide 6

3
Median

 The median is the measure of location most often


reported for annual income and property value data.
 A few extremely large incomes or property values
can inflate the mean.

Slide 7

Median

 The median of a data set is the value in the middle


when the data items are arranged in ascending order.
 For an odd number of observations, the median is the
middle value.
 For an even number of observations, the median is
the average of the two middle values.

Slide 8

4
Example: Apartment Rents

 Median
Median = 50th percentile
i = (p/100)n = (50/100)70 = 35.5
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475

Slide 9

Mode

 The mode of a data set is the value that occurs with


greatest frequency.
 The greatest frequency can occur at two or more
different values.
 If the data have exactly two modes, the data are
bimodal.
 If the data have more than two modes, the data are
multimodal.

Slide 10

10

5
Example: Apartment Rents

 Mode
450 occurred most frequently (7 times)
Mode = 450

Slide 11

11

Percentiles

 A percentile provides information about how the


data are spread over the interval from the smallest
value to the largest value.
 Admission test scores for colleges and universities
are frequently reported in terms of percentiles.

Slide 12

12

6
Percentiles
 The pth percentile of a data set is a value such that at
least p percent of the items take on this value or less
and at least (100 - p) percent of the items take on this
value or more.
• Arrange the data in ascending order.
• Compute index i, the position of the pth percentile.

i = (p/100)n
• If i is not an integer, round up. The pth percentile is
the value in the ith position.
• If i is an integer, the pth percentile is the average of
the values in positions i and i+1.

Slide 13

13

Example: Apartment Rents

 90th Percentile
i = (p/100)n = (90/100)70 = 63
Averaging the 63rd and 64th data values:
90th Percentile = (580 + 590)/2 = 585

Slide 14

14

7
Quartiles

 Quartiles are specific percentiles


 First Quartile = 25th Percentile
 Second Quartile = 50th Percentile = Median
 Third Quartile = 75th Percentile

Slide 15

15

Example: Apartment Rents

 Third Quartile
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525

Slide 16

16

8
Measures of Variability

 It is often desirable to consider measures of


variability (dispersion), as well as measures of
location.
 For example, in choosing supplier A or supplier B we
might consider not only the average delivery time for
each, but also the variability in delivery time for each.

Slide 17

17

Measures of Variability

 Range
 Interquartile Range
 Variance
 Standard Deviation
 Coefficient of Variation

Slide 18

18

9
Range

 The range of a data set is the difference between the


largest and smallest data values.
 It is the simplest measure of variability.
 It is very sensitive to the smallest and largest data
values.

Slide 19

19

Example: Apartment Rents

 Range
Range = largest value - smallest value
Range = 615 - 425 = 190

Slide 20

20

10
Interquartile Range

 The interquartile range of a data set is the difference


between the third quartile and the first quartile.
 It is the range for the middle 50% of the data.
 It overcomes the sensitivity to extreme data values.

Slide 21

21

Example: Apartment Rents

 Interquartile Range
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80

Slide 22

22

11
Variance

 The variance is a measure of variability that utilizes


all the data.
 It is based on the difference between the value of
each observation (xi) and the mean (x for a sample, m
for a population).

Slide 23

23

Variance

 The variance is the average of the squared differences


between each data value and the mean.
 If the data set is a sample, the variance is denoted by
s2 .

 If the data set is a population, the variance is denoted


by  2.

Slide 24

24

12
Standard Deviation

 The standard deviation of a data set is the positive


square root of the variance.
 It is measured in the same units as the data, making
it more easily comparable, than the variance, to the
mean.
 If the data set is a sample, the standard deviation is
denoted s.

 If the data set is a population, the standard deviation


is denoted  (sigma).

Slide 25

25

Coefficient of Variation

 The coefficient of variation indicates how large the


standard deviation is in relation to the mean.
 If the data set is a sample, the coefficient of variation
is computed as follows:

 If the data set is a population, the coefficient of


variation is computed as follows:

Slide 26

26

13
Example: Apartment Rents

 Variance

 Standard Deviation

 Coefficient of Variation

Slide 27

27

Measures of Relative Location


and Detecting Outliers
 z-Scores
 Chebyshev’s Theorem
 Empirical Rule
 Detecting Outliers

Slide 28

28

14
z-Scores

 The z-score is often called the standardized value.


 It denotes the number of standard deviations a data
value xi is from the mean.

 A data value less than the sample mean will have a z-


score less than zero.
 A data value greater than the sample mean will have
a z-score greater than zero.
 A data value equal to the sample mean will have a z-
score of zero.

Slide 29

29

Example: Apartment Rents

 z-Score of Smallest Value (425)

Standardized Values for Apartment Rents

Slide 30

30

15
Chebyshev’s Theorem

At least (1 - 1/k2) of the items in any data set will be


within k standard deviations of the mean, where k is
any value greater than 1.
• At least 75% of the items must be within
k = 2 standard deviations of the mean.
• At least 89% of the items must be within
k = 3 standard deviations of the mean.
• At least 94% of the items must be within
k = 4 standard deviations of the mean.

Slide 31

31

Example: Apartment Rents

 Chebyshev’s Theorem

Let k = 1.5 with = 490.80 and s = 54.74

At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56%


of the rent values must be between
- k(s) = 490.80 - 1.5(54.74) = 409
and
+ k(s) = 490.80 + 1.5(54.74) = 573

Slide 32

32

16
Example: Apartment Rents

 Chebyshev’s Theorem (continued)


Actually, 86% of the rent values
are between 409 and 573.

Slide 33

33

Empirical Rule

For data having a bell-shaped distribution:

• Approximately 68% of the data values will be


within one standard deviation of the mean.

Slide 34

34

17
Empirical Rule

For data having a bell-shaped distribution:

• Approximately 95% of the data values will be


within two standard deviations of the mean.

Slide 35

35

Empirical Rule

For data having a bell-shaped distribution:

• Almost all (99.7%) of the items will be


within three standard deviations of the mean.

Slide 36

36

18
Example: Apartment Rents

 Empirical Rule
Interval % in Interval
Within +/- 1s 436.06 to 545.54 48/70 = 69%
Within +/- 2s 381.32 to 600.28 68/70 = 97%
Within +/- 3s 326.58 to 655.02 70/70 = 100%

Slide 37

37

Detecting Outliers

 An outlier is an unusually small or unusually large


value in a data set.
 A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
 It might be an incorrectly recorded data value.
 It might be a data value that was incorrectly included
in the data set.
 It might be a correctly recorded data value that
belongs in the data set !

Slide 38

38

19
Example: Apartment Rents

 Detecting Outliers
The most extreme z-scores are -1.20 and 2.27.
Using |z| > 3 as the criterion for an outlier,
there are no outliers in this data set.
Standardized Values for Apartment Rents

Slide 39

39

Exploratory Data Analysis

 Five-Number Summary
 Box Plot

Slide 40

40

20
Five-Number Summary

 Smallest Value
 First Quartile
 Median
 Third Quartile
 Largest Value

Slide 41

41

Example: Apartment Rents

 Five-Number Summary
Lowest Value = 425 First Quartile = 450
Median = 475
Third Quartile = 525 Largest Value = 615

Slide 42

42

21
Box Plot

 A box is drawn with its ends located at the first and


third quartiles.
 A vertical line is drawn in the box at the location of
the median.
 Limits are located (not drawn) using the interquartile
range (IQR).
• The lower limit is located 1.5(IQR) below Q1.
• The upper limit is located 1.5(IQR) above Q3.
• Data outside these limits are considered outliers.
… continued

Slide 43

43

Box Plot (Continued)

 Whiskers (dashed lines) are drawn from the ends of


the box to the smallest and largest data values inside
the limits.
 The locations of each outlier is shown with the
symbol * .

Slide 44

44

22
Example: Apartment Rents

 Box Plot

Lower Limit: Q1 - 1.5(IQR) = 450 - 1.5(75) = 337.5


Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5
There are no outliers.

37 40 42 45 47 50 52 550 575 600 625


5 0 5 0 5 0 5

Slide 45

45

Measures of Association
Between Two Variables
 Covariance
 Correlation Coefficient

Slide 46

46

23
Covariance

 The covariance is a measure of the linear association


between two variables.
 Positive values indicate a positive relationship.
 Negative values indicate a negative relationship.

Slide 47

47

Covariance

 If the data sets are samples, the covariance is denoted


by sxy.

 If the data sets are populations, the covariance is


denoted by .

Slide 48

48

24
Correlation Coefficient

 The coefficient can take on values between -1 and +1.


 Values near -1 indicate a strong negative linear
relationship.
 Values near +1 indicate a strong positive linear
relationship.
 If the data sets are samples, the coefficient is rxy.

 If the data sets are populations, the coefficient is .

Slide 49

49

The Weighted Mean and


Working with Grouped Data
 Weighted Mean
 Mean for Grouped Data
 Variance for Grouped Data
 Standard Deviation for Grouped Data

Slide 50

50

25
Weighted Mean

 When the mean is computed by giving each data


value a weight that reflects its importance, it is
referred to as a weighted mean.
 In the computation of a grade point average (GPA),
the weights are the number of credit hours earned for
each grade.
 When data values vary in importance, the analyst
must choose the weight that best reflects the
importance of each value.

Slide 51

51

Weighted Mean

x =  wi xi
 wi

where:
xi = value of observation i
wi = weight for observation i

Slide 52

52

26
Grouped Data

 The weighted mean computation can be used to


obtain approximations of the mean, variance, and
standard deviation for the grouped data.
 To compute the weighted mean, we treat the
midpoint of each class as though it were the mean of
all items in the class.
 We compute a weighted mean of the class midpoints
using the class frequencies as weights.
 Similarly, in computing the variance and standard
deviation, the class frequencies are used as weights.

Slide 53

53

Mean for Grouped Data

 Sample Data

 Population Data

where:
fi = frequency of class i
Mi = midpoint of class i

Slide 54

54

27
Example: Apartment Rents

Given below is the previous sample of monthly rents


for one-bedroom apartments presented here as grouped
data in the form of a frequency distribution.

Slide 55

55

Example: Apartment Rents

 Mean for Grouped Data

This approximation
differs by $2.41 from
the actual sample
mean of $490.80.

Slide 56

56

28
Variance for Grouped Data

 Sample Data

 Population Data

Slide 57

57

Example: Apartment Rents

 Variance for Grouped Data

 Standard Deviation for Grouped Data

This approximation differs by only $.20


from the actual standard deviation of $54.74.

Slide 58

58

29
Supplementary Problems for Chapter Three

Slide 59

59

Slide 60

60

30
End of Chapter 3

Slide 61

61

31

You might also like