Paris Graduate School of Management (PGSM)

International Executive Master of Business Administration

Paris Graduate School of Management

École Supérieure de Gestion et Commerce International



Paris Graduate School of Management

École Supérieure de Gestion et Commerce International

Management Decision Making
January 2024

Chapter 1
Data and Statistics
n Applications in Business and Economics
n Data
n Data Sources
n Descriptive Statistics
n Statistical Inference

Applications in
Business and Economics
n Accounting
Public accounting firms use statistical sampling procedures when
conducting audits for their clients.
n Finance
Financial advisors use a variety of statistical information, including
price-earnings ratios and dividend yields, to guide their
investment recommendations.
n Marketing
Electronic point-of-sale scanners at retail checkout counters are
being used to collect data for a variety of marketing research

Applications in
Business and Economics
n Production
A variety of statistical quality control charts are used to
monitor the output of a production process.
n Economics
Economists use statistical information in making
forecasts about the future of the economy or some
aspect of it.

n Elements, Variables, and Observations

n Scales of Measurement
n Qualitative and Quantitative Data
n Cross-Sectional and Time Series Data

Data and Data Sets

n Data are the facts and figures that are collected,

summarized, analyzed, and interpreted.
n The data collected in a particular study are referred to as
the data set.

Elements, Variables, and Observations

n The elements are the entities on which data are
n A variable is a characteristic of interest for the
n The set of measurements collected for a particular
element is called an observation.
n The total number of data values in a data set is the
number of elements multiplied by the number of

Data, Data Sets,

Elements, Variables, and Observations

Variables Stock Annual Earn/

Company Exchange Sales($M) Sh.($)
Dataram AMEX 73.10 0.86
EnergySouth OTC 74.00 1.67
Keystone NYSE 365.70 0.86
LandCare NYSE 111.40 0.33
Psychemedics AMEX 17.60 0.13
Elements Data Set Datum

Scales of Measurement
n Scales of measurement include:
• Nominal
• Ordinal
• Interval
• Ratio
n The scale determines the amount of information
contained in the data.
n The scale indicates the data summarization and
statistical analyses that are most appropriate.

Scales of Measurement
n Nominal
• Data are labels or names used to identify an
attribute of the element.
• A nonnumeric label or a numeric code may be used.

Scales of Measurement
n Nominal
• Example:
Students of a university are classified by the school
in which they are enrolled using a nonnumeric label
such as Business, Humanities, Education, and so on.
Alternatively, a numeric code could be used for the
school variable (e.g. 1 denotes Business, 2 denotes
Humanities, 3 denotes Education, and so on).

Scales of Measurement

n Ordinal
• The data have the properties of nominal data and
the order or rank of the data is meaningful.
• A nonnumeric label or a numeric code may be used.

Scales of Measurement
n Ordinal
• Example:
Students of a university are classified by their
class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for
the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).

Scales of Measurement
n Interval
• The data have the properties of ordinal data and the
interval between observations is expressed in terms
of a fixed unit of measure.
• Interval data are always numeric.

Scales of Measurement
n Interval
• Example:
Melissa has an SAT score of 1205, while Kevin has
an SAT score of 1090. Melissa scored 115 points
more than Kevin.

Scales of Measurement
n Ratio
• The data have all the properties of interval data and
the ratio of two values is meaningful.
• Variables such as distance, height, weight, and time
use the ratio scale.
• This scale must contain a zero value that indicates
that nothing exists for the variable at the zero point.

Scales of Measurement
n Ratio
• Example:
Melissa’s college record shows 36 credit hours
earned, while Kevin’s record shows 72 credit
hours earned. Kevin has twice as many credit
hours earned as Melissa.

Qualitative and Quantitative Data

n Data can be further classified as being qualitative or
n The statistical analysis that is appropriate depends on
whether the data for the variable are qualitative or
n In general, there are more alternatives for statistical
analysis when the data are quantitative.

Qualitative Data
n Qualitative data are labels or names used to identify an
attribute of each element.
n Qualitative data use either the nominal or ordinal scale
of measurement.
n Qualitative data can be either numeric or nonnumeric.
n The statistical analysis for qualitative data are rather

Quantitative Data
n Quantitative data indicate either how many or how
• Quantitative data that measure how many are
• Quantitative data that measure how much are
continuous because there is no separation between
the possible values for the data..
n Quantitative data are always numeric.
n Ordinary arithmetic operations are meaningful only
with quantitative data.

Cross-Sectional and Time Series Data

n Cross-sectional data are collected at the same or
approximately the same point in time.
• Example: data detailing the number of building
permits issued in June 2000 in each of the counties
of Texas
n Time series data are collected over several time
• Example: data detailing the number of building
permits issued in Travis County, Texas in each of the
last 36 months

Data Sources
n Existing Sources
• Data needed for a particular application might
already exist within a firm. Detailed information is
often kept on customers, suppliers, and employees
for example.
• Substantial amounts of business and economic data
are available from organizations that specialize in
collecting and maintaining data.

Data Sources
n Existing Sources
• Government agencies are another important source
of data.
• Data are also available from a variety of industry
associations and special-interest organizations.

Data Sources
n Internet
• The Internet has become an important source of
• Most government agencies, like the Bureau of the
Census (, make their data available
through a web site.
• More and more companies are creating web sites
and providing public access to them.
• A number of companies now specialize in making
information available over the Internet.

Data Sources
n Statistical Studies
• Statistical studies can be classified as either experimental
or observational.
• In experimental studies the variables of interest are first
identified. Then one or more factors are controlled so
that data can be obtained about how the factors influence
the variables.
• In observational (nonexperimental) studies no attempt is
made to control or influence the variables of interest.
• A survey is perhaps the most common type of
observational study.

Data Acquisition Considerations

n Time Requirement
• Searching for information can be time consuming.
• Information might no longer be useful by the time it
is available.
n Cost of Acquisition
• Organizations often charge for information even
when it is not their primary business activity.
n Data Errors
• Using any data that happens to be available or that
were acquired with little care can lead to poor and
misleading information.

Descriptive Statistics

n Descriptive statistics are the tabular, graphical, and

numerical methods used to summarize data.

Example: Hudson Auto Repair

The manager of Hudson Auto would like to have a better
understanding of the cost of parts used in the engine tune-
ups performed in the shop. She examines 50 customer
invoices for tune-ups. The costs of parts, rounded to the
nearest dollar, are listed below.

91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73

Example: Hudson Auto Repair

n Tabular Summary (Frequencies and Percent Frequencies)
Parts Percent
Cost ($) Frequency Frequency
50-59 2 4
60-69 13 26
70-79 16 32
80-89 7 14
90-99 7 14
100-109 5 10
Total 50 100

Example: Hudson Auto Repair

n Graphical Summary (Histogram)

50 60 70 80 90 100 110 Cost ($)

Example: Hudson Auto Repair

n Numerical Descriptive Statistics

• The most common numerical descriptive statistic is

the average (or mean).
• Hudson’s average cost of parts, based on the 50
tune-ups studied, is $79 (found by summing the 50
cost values and then dividing by 50).

Statistical Inference
n Statistical inference is the process of using data
obtained from a small group of elements (the sample)
to make estimates and test hypotheses about the
characteristics of a larger group of elements (the

Example: Hudson Auto Repair

n Process of Statistical Inference
1. Population
consists of all 2. A sample of 50
tune-ups. Average engine tune-ups
cost of parts is is examined.

4. The value of the 3. The sample data

sample average is used provide a sample
to make an estimate of average cost of
the population average. $79 per tune-up.

End of Chapter 1

Chapter 2
Descriptive Statistics:
Tabular and Graphical Methods

n Summarizing Qualitative Data

n Summarizing Quantitative Data
n Exploratory Data Analysis
n Crosstabulations
and Scatter Diagrams

Summarizing Qualitative Data

n Frequency Distribution
n Relative Frequency
n Percent Frequency Distribution
n Bar Graph
n Pie Chart

Frequency Distribution
n A frequency distribution is a tabular summary of data
showing the frequency (or number) of items in each of
several nonoverlapping classes.
n The objective is to provide insights about the data that
cannot be quickly obtained by looking only at the
original data.

Example: Marada Inn

Guests staying at Marada Inn were asked to rate the quality
of their accommodations as being excellent, above average,
average, below average, or poor. The ratings provided by a
sample of 20 guests are shown below.

Below Average Average Above Average

Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average

Example: Marada Inn

n Frequency Distribution

Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20

Relative Frequency Distribution

n The relative frequency of a class is the fraction or

proportion of the total number of data items belonging
to the class.
n A relative frequency distribution is a tabular summary
of a set of data showing the relative frequency for each

Percent Frequency Distribution

n The percent frequency of a class is the relative

frequency multiplied by 100.
n A percent frequency distribution is a tabular summary
of a set of data showing the percent frequency for each

Example: Marada Inn

n Relative Frequency and Percent Frequency Distributions

Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25
Above Average .45 45
Excellent .05 5
Total 1.00 100

Bar Graph
n A bar graph is a graphical device for depicting qualitative
data that have been summarized in a frequency, relative
frequency, or percent frequency distribution.
n On the horizontal axis we specify the labels that are used for
each of the classes.
n A frequency, relative frequency, or percent frequency scale
can be used for the vertical axis.
n Using a bar of fixed width drawn above each class label, we
extend the height appropriately.
n The bars are separated to emphasize the fact that each class
is a separate category.

Example: Marada Inn

n Bar Graph

Below Average Above Excellent Rating
Average Average

Pie Chart
n The pie chart is a commonly used graphical device for
presenting relative frequency distributions for
qualitative data.
n First draw a circle; then use the relative frequencies to
subdivide the circle into sectors that correspond to the
relative frequency for each class.
n Since there are 360 degrees in a circle, a class with a
relative frequency of .25 would consume .25(360) =
90 degrees of the circle.

Example: Marada Inn

n Pie Chart
Exc. Poor
5% 10%
Above Average
Average 15%
45% Average

Quality Ratings

Example: Marada Inn

n Insights Gained from the Preceding Pie Chart
• One-half of the customers surveyed gave Marada a
quality rating of “above average” or “excellent”
(looking at the left side of the pie). This might
please the manager.
• For each customer who gave an “excellent” rating,
there were two customers who gave a “poor” rating
(looking at the top of the pie). This should displease
the manager.

Summarizing Quantitative Data

n Frequency Distribution
n Relative Frequency and Percent Frequency
n Dot Plot
n Histogram
n Cumulative Distributions
n Ogive

Example: Hudson Auto Repair

The manager of Hudson Auto would like to get a better
picture of the distribution of costs for engine tune-up
parts. A sample of 50 customer invoices has been taken
and the costs of parts, rounded to the nearest dollar, are
listed below.
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73

Frequency Distribution
n Guidelines for Selecting Number of Classes
• Use between 5 and 20 classes.
• Data sets with a larger number of elements usually
require a larger number of classes.
• Smaller data sets usually require fewer classes.

Frequency Distribution
n Guidelines for Selecting Width of Classes
• Use classes of equal width.
• Approximate Class Width =
Largest Data Value  Smallest Data Value
Number of Classes

Example: Hudson Auto Repair

n Frequency Distribution
If we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5 10
Cost ($) Frequency
50-59 2
60-69 13
70-79 16
80-89 7
90-99 7
100-109 5
Total 50

Example: Hudson Auto Repair

n Relative Frequency and Percent Frequency Distributions
Relative Percent
Cost ($) Frequency Frequency
50-59 .04 4
60-69 .26 26
70-79 .32 32
80-89 .14 14
90-99 .14 14
100-109 .10 10
Total 1.00 100

Example: Hudson Auto Repair

n Insights Gained from the Percent Frequency
• Only 4% of the parts costs are in the $50-59 class.
• 30% of the parts costs are under $70.
• The greatest percentage (32% or almost one-third)
of the parts costs are in the $70-79 class.
• 10% of the parts costs are $100 or more.

Dot Plot
n One of the simplest graphical summaries of data is a
dot plot.
n A horizontal axis shows the range of data values.
n Then each data value is represented by a dot placed
above the axis.

Example: Hudson Auto Repair

n Dot Plot

... .... .. ... ... . .

. . . ..... .......... .. . .. . . ... . .. .
50 60 70 80 90 100 110
Cost ($)

n Another common graphical presentation of quantitative data
is a histogram.
n The variable of interest is placed on the horizontal axis and
the frequency, relative frequency, or percent frequency is
placed on the vertical axis.
n A rectangle is drawn above each class interval with its height
corresponding to the interval’s frequency, relative frequency,
or percent frequency.
n Unlike a bar graph, a histogram has no natural separation
between rectangles of adjacent classes.

Example: Hudson Auto Repair

n Histogram

2 Parts
50 60 70 80 90 100 110 Cost ($)

Cumulative Distribution
n The cumulative frequency distribution shows the
number of items with values less than or equal to the
upper limit of each class.
n The cumulative relative frequency distribution shows
the proportion of items with values less than or equal
to the upper limit of each class.
n The cumulative percent frequency distribution shows
the percentage of items with values less than or equal
to the upper limit of each class.

Example: Hudson Auto Repair

n Cumulative Distributions
Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 59 2 .04 4
< 69 15 .30 30
< 79 31 .62 62
< 89 38 .76 76
< 99 45 .90 90
< 109 50 1.00 100

Exploratory Data Analysis

n The techniques of exploratory data analysis consist of
simple arithmetic and easy-to-draw pictures that can
be used to summarize data quickly.
n One such technique is the stem-and-leaf display.

Crosstabulations and Scatter Diagrams

n Thus far we have focused on methods that are used to
summarize the data for one variable at a time.
n Often a manager is interested in tabular and graphical
methods that will help understand the relationship
between two variables.
n Crosstabulation and a scatter diagram are two methods
for summarizing the data for two (or more) variables

n Crosstabulation is a tabular method for summarizing the
data for two variables simultaneously.
n Crosstabulation can be used when:
• One variable is qualitative and the other is quantitative
• Both variables are qualitative
• Both variables are quantitative
n The left and top margin labels define the classes for the two

Example: Finger Lakes Homes

n Crosstabulation
The number of Finger Lakes homes sold for each style
and price for the past two years is shown below.
Price Home Style
Range Colonial Ranch Split A-Frame Total

< $99,000 18 6 19 12 55
> $99,000 12 14 16 3 45

Total 30 20 35 15 100

Example: Finger Lakes Homes

n Insights Gained from the Preceding Crosstabulation

• The greatest number of homes in the sample (19)
are a split-level style and priced at less than or equal
to $99,000.
• Only three homes in the sample are an A-Frame style
and priced at more than $99,000.

Crosstabulation: Row or Column Percentages

n Converting the entries in the table into row percentages
or column percentages can provide additional insight
about the relationship between the two variables.

Example: Finger Lakes Homes

n Row Percentages

Price Home Style

Range Colonial Ranch Split A-Frame Total

< $99,000 32.73 10.91 34.55 21.82 100

> $99,000 26.67 31.11 35.56 6.67 100

Note: row totals are actually 100.01 due to rounding.

Example: Finger Lakes Homes

n Column Percentages
Price Home Style
Range Colonial Ranch Split A-Frame

< $99,000 60.00 30.00 54.29 80.00

> $99,000 40.00 70.00 45.71 20.00

Total 100 100 100 100

Scatter Diagram
n A scatter diagram is a graphical presentation of the
relationship between two quantitative variables.
n One variable is shown on the horizontal axis and the
other variable is shown on the vertical axis.
n The general pattern of the plotted points suggests the
overall relationship between the variables.

Scatter Diagram
n A Positive Relationship

Scatter Diagram
n A Negative Relationship

Scatter Diagram
n No Apparent Relationship

Example: Panthers Football Team

n Scatter Diagram
The Panthers football team is interested in investigating
the relationship, if any, between interceptions made and
points scored.
x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 27

Example: Panthers Football Team

n Scatter Diagram
Number of Points Scored

0 x
0 1 2 3
Number of Interceptions

Example: Panthers Football Team

n The preceding scatter diagram indicates a positive
relationship between the number of interceptions and
the number of points scored.
n Higher points scored are associated with a higher
number of interceptions.
n The relationship is not perfect; all plotted points in the
scatter diagram are not on a straight line.

Tabular and Graphical Procedures

Qualitative Data Quantitative Data

Tabular Graphical Tabular Graphical

Methods Methods Methods Methods
•Frequency •Bar Graph •Frequency
•Dot Plot
Distribution •Pie Chart Distribution
•Rel. Freq. Dist. •Rel. Freq. Dist.
•% Freq. Dist. •Cum. Freq. Dist.
•Crosstabulation •Cum. Rel. Freq.

End of Chapter 2

Chapter 3
Descriptive Statistics: Numerical Methods
n Measures of Location
n Measures of Variability
n Measures of Relative Location and Detecting Outliers
n Exploratory Data Analysis
n Measures of Association Between Two Variables
n The Weighted Mean and
Working with Grouped Data

Measures of Location
n Mean
n Median
n Mode
n Percentiles
n Quartiles

Example: Apartment Rents

Given below is a sample of monthly rent values ($) for one-
bedroom apartments. The data is a sample of 70 apartments
in a particular city. The data are presented in ascending
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

n The mean of a data set is the average of all the data
n If the data are from a sample, the mean is denoted by x
 xi
x 
n If the data are from a population, the mean is denoted
bym (mu).
 xi

Example: Apartment Rents

n Mean
 xi 34 , 356
x    490.80
n 70
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

n The median is the measure of location most often
reported for annual income and property value data.
n A few extremely large incomes or property values can
inflate the mean.

n The median of a data set is the value in the middle
when the data items are arranged in ascending order.
n For an odd number of observations, the median is the
middle value.
n For an even number of observations, the median is the
average of the two middle values.

Example: Apartment Rents

n Median
Median = 50th percentile
i = (p/100)n = (50/100)70 = 35.5
Averaging the 35th and 36th data values:
425 430 430 435 435 435 435
Median = (475 + 475)/2 = 475435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

n The mode of a data set is the value that occurs with
greatest frequency.
n The greatest frequency can occur at two or more
different values.
n If the data have exactly two modes, the data are
n If the data have more than two modes, the data are

Example: Apartment Rents

n Mode
450 occurred most frequently (7 times)
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

n A percentile provides information about how the data
are spread over the interval from the smallest value to
the largest value.
n Admission test scores for colleges and universities are
frequently reported in terms of percentiles.

n The pth percentile of a data set is a value such that at least p
percent of the items take on this value or less and at least (100
- p) percent of the items take on this value or more.
• Arrange the data in ascending order.
• Compute index i, the position of the pth percentile.
i = (p/100)n
• If i is not an integer, round up. The p th percentile is the
value in the i th position.
• If i is an integer, the p th percentile is the average of the
values in positions i and i +1.

Example: Apartment Rents

n 90th Percentile
i = (p/100)n = (90/100)70 = 63
Averaging the 63rd and 64th data values:
90th Percentile = (580 + 590)/2 = 585
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

n Quartiles are specific percentiles

n First Quartile = 25th Percentile
n Second Quartile = 50th Percentile = Median
n Third Quartile = 75th Percentile

Example: Apartment Rents

n Third Quartile
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Measures of Variability
n It is often desirable to consider measures of variability
(dispersion), as well as measures of location.
n For example, in choosing supplier A or supplier B we
might consider not only the average delivery time for
each, but also the variability in delivery time for each.

Measures of Variability
n Range
n Interquartile Range
n Variance
n Standard Deviation
n Coefficient of Variation

n The range of a data set is the difference between the

largest and smallest data values.
n It is the simplest measure of variability.
n It is very sensitive to the smallest and largest data

Example: Apartment Rents

n Range
Range = largest value - smallest value
Range = 615 - 425 = 190
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Interquartile Range
n The interquartile range of a data set is the difference
between the third quartile and the first quartile.
n It is the range for the middle 50% of the data.
n It overcomes the sensitivity to extreme data values.

Example: Apartment Rents

n Interquartile Range
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

n The variance is a measure of variability that utilizes
all the data.
n It is based on the difference between the value of each
observation (xi) and the mean (x for a sample,  for a

n The variance is the average of the squared differences
between each data value and the mean.
n If the data set is a sample, the variance is denoted by s2.
2  ( xi  x )
s 
n 1

n If the data set is a population, the variance is denoted by  2.

2  ( xi   )
 

Standard Deviation
n The standard deviation of a data set is the positive square
root of the variance.
n It is measured in the same units as the data, making it more
easily comparable, than the variance, to the mean.
n If the data set is a sample, the standard deviation is denoted
s  s2

n If the data set is a population, the standard deviation is

denoted (sigma).
  2

Coefficient of Variation
n The coefficient of variation indicates how large the
standard deviation is in relation to the mean.
n If the data set is a sample, the coefficient of variation is
computed as follows:
n If the data set is a population, the coefficient of variation is
computed as follows:


Example: Apartment Rents

n Variance
2  ( xi  x ) 2
s   2 , 996.16
n 1

n Standard Deviation
s s2  2996. 47  54. 74

n Coefficient of Variation
s 54. 74
 100   100  11.15
x 490.80

Measures of Relative Location

and Detecting Outliers
n z-Scores
n Chebyshev’s Theorem
n Empirical Rule
n Detecting Outliers

n The z-score is often called the standardized value.
n It denotes the number of standard deviations a data value xi
is from the mean.
x x
zi  i
n A data value less than the sample mean will have a z-score
less than zero.
n A data value greater than the sample mean will have a z-
score greater than zero.
n A data value equal to the sample mean will have a z-score of

Example: Apartment Rents

n z-Score of Smallest Value (425)
xi  x 425  490.80
z   1. 20
s 54. 74

-1.20 Standardized
-1.11 -1.11 -1.02Values
-1.02 for Apartment
-1.02 Rents
-1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

Chebyshev’s Theorem
At least (1 - 1/k2) of the items in any data set will be
within k standard deviations of the mean, where k is
any value greater than 1.
• At least 75% of the items must be within
k = 2 standard deviations of the mean.
• At least 89% of the items must be within
k = 3 standard deviations of the mean.
• At least 94% of the items must be within
k = 4 standard deviations of the mean.

Example: Apartment Rents

n Chebyshev’s Theorem

Let k = 1.5 with x = 490.80 and s = 54.74

At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56%

x of the rent values must be between
x - k(s) = 490.80 - 1.5(54.74) = 409
+ k(s) = 490.80 + 1.5(54.74) = 573

Example: Apartment Rents

n Chebyshev’s Theorem (continued)
Actually, 86% of the rent values
are between 409 and 573.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Empirical Rule
For data having a bell-shaped distribution:

• Approximately 68% of the data values will be within

one standard deviation of the mean.

Empirical Rule
For data having a bell-shaped distribution:

• Approximately 95% of the data values will be within

two standard deviations of the mean.

Empirical Rule
For data having a bell-shaped distribution:

• Almost all (99.7%) of the items will be within three

standard deviations of the mean.

Example: Apartment Rents

n Empirical Rule
Interval % in Interval
Within +/- 1s 436.06 to 545.54 48/70 = 69%
Within +/- 2s 381.32 to 600.28 68/70 = 97%
Within +/- 3s 326.58 to 655.02 70/70 = 100%
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Detecting Outliers
n An outlier is an unusually small or unusually large value in a
data set.
n A data value with a z-score less than -3 or greater than +3
might be considered an outlier.
n It might be an incorrectly recorded data value.
n It might be a data value that was incorrectly included in the
data set.
n It might be a correctly recorded data value that belongs in
the data set !

Example: Apartment Rents

n Detecting Outliers
The most extreme z-scores are -1.20 and 2.27.
Using |z| > 3 as the criterion for an outlier,
there are no outliers in this data set.
Standardized Values for Apartment Rents
-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

Exploratory Data Analysis

n Five-Number Summary
n Box Plot

Five-Number Summary
n Smallest Value
n First Quartile
n Median
n Third Quartile
n Largest Value

Example: Apartment Rents

n Five-Number Summary
Lowest Value = 425 First Quartile = 450
Median = 475 Third Quartile = 525
Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Box Plot
n A box is drawn with its ends located at the first and third
n A vertical line is drawn in the box at the location of the
n Limits are located (not drawn) using the interquartile range
• The lower limit is located 1.5(IQR) below Q1.
• The upper limit is located 1.5(IQR) above Q3.
• Data outside these limits are considered outliers.
… continued

Box Plot (Continued)

n Whiskers (dashed lines) are drawn from the ends of the
box to the smallest and largest data values inside the
n The locations of each outlier is shown with the symbol

Example: Apartment Rents

n Box Plot

Lower Limit: Q1 - 1.5(IQR) = 450 - 1.5(75) = 337.5

Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5
There are no outliers.

37 40 42 45 47 50 52 550 575 600 625

5 0 5 0 5 0 5

Measures of Association
Between Two Variables
n Covariance
n Correlation Coefficient

n The covariance is a measure of the linear association
between two variables.
n Positive values indicate a positive relationship.
n Negative values indicate a negative relationship.

n If the data sets are samples, the covariance is denoted
by sxy.
 ( xi  x )( yi  y )
sxy 
n 1

n If the data sets are populations, the covariance is

denoted by xy .

 ( xi   x )( yi   y )
 xy 

Correlation Coefficient
n The coefficient can take on values between -1 and +1.
n Values near -1 indicate a strong negative linear
n Values near +1 indicate a strong positive linear
n If the data sets are samples, the coefficient is rxy.
rxy 
sx s y
 If the data sets are populations, the coefficient is  xy .
 xy
 xy 
 x y

The Weighted Mean and

Working with Grouped Data

n Weighted Mean
n Mean for Grouped Data
n Variance for Grouped Data
n Standard Deviation for Grouped Data

Weighted Mean
n When the mean is computed by giving each data value a
weight that reflects its importance, it is referred to as a
weighted mean.
n In the computation of a grade point average (GPA), the
weights are the number of credit hours earned for each
n When data values vary in importance, the analyst must
choose the weight that best reflects the importance of each

Weighted Mean
x =  wi xi
 wi
xi = value of observation i
wi = weight for observation i

Grouped Data
n The weighted mean computation can be used to obtain
approximations of the mean, variance, and standard
deviation for the grouped data.
n To compute the weighted mean, we treat the midpoint of
each class as though it were the mean of all items in the
n We compute a weighted mean of the class midpoints using
the class frequencies as weights.
n Similarly, in computing the variance and standard
deviation, the class frequencies are used as weights.

Mean for Grouped Data

n Sample Data
fM i i

f i

n Population Data fM i i


fi = frequency of class i
Mi = midpoint of class i

Example: Apartment Rents

Given below is the previous sample of monthly rents for one
bedroom apartments presented here as grouped data in the
form of a frequency distribution.
Rent ($) Frequency
420-439 8
440-459 17
460-479 12
480-499 8
500-519 7
520-539 4
540-559 2
560-579 4
580-599 2
600-619 6

Example: Apartment Rents

n Mean for Grouped Data
Rent ($) fi Mi f iMi
420-439 8 429.5 3436.0 34 , 525
x   493. 21
440-459 17 449.5 7641.5 70
460-479 12 469.5 5634.0
480-499 8 489.5 3916.0
500-519 7 509.5 3566.5
520-539 4 529.5 2118.0
This approximation
540-559 2 549.5 1099.0 differs by $2.41 from
560-579 4 569.5 2278.0
580-599 2 589.5 1179.0 the actual sample
600-619 6 609.5 3657.0 mean of $490.80.
Total 70 34525.0

Variance for Grouped Data

n Sample Data
 f i ( Mi  x ) 2
s2 
n 1

n Population Data
 f i ( Mi   )
2 

Example: Apartment Rents

n Variance for Grouped Data
s 2  3, 017.89

n Standard Deviation for Grouped Data

s 3, 017.89  54. 94

This approximation differs by only $.20

from the actual standard deviation of $54.74.

