Statistical Analysis BSA

Chapter 1
Data and Statistics

 Applications in Business and Economics
 Data
 Data Sources
 Descriptive Statistics
 Statistical Inference
© 2003 Thomson/South-Western Slide

1
Applications in
Business and Economics
 Accounting
Public accounting firms use statistical sampling
procedures when conducting audits for their clients.
 Finance
Financial analysts use a variety of statistical
information, including price-earnings ratios and
dividend yields, to guide their investment
recommendations.
 Marketing
Electronic point-of-sale scanners at retail checkout
counters are being used to collect data for a variety of
marketing research applications.

2
Applications in
Business and Economics
 Production
A variety of statistical quality control charts are used
to monitor the output of a production process.
 Economics
Economists use statistical information in making
forecasts about the future of the economy or some
aspect of it.

3
Data
 Elements, Variables, and Observations

 Scales of Measurement
 Qualitative and Quantitative Data
 Cross-Sectional and Time Series Data

4
Data and Data Sets
 Data are the facts and figures that are collected,

summarized, analyzed, and interpreted.
 The data collected in a particular study are referred to
as the data set.

5
Elements, Variables, and Observations
 The elements are the entities on which data are

collected.
 A variable is a characteristic of interest for the
elements.
 The set of measurements collected for a particular
element is called an observation.
 The total number of data values in a data set is the
number of elements multiplied by the number of
variables.

6
Data, Data Sets,
Elements, Variables, and Observations
Observation
Variables
Stock Annual Earn/
Company Exchange Sales($M) Sh.($)
Dataram AMEX 73.10 0.86
EnergySouth OTC 74.00 1.67
Keystone NYSE 365.70 0.86
LandCare NYSE 111.40 0.33
Psychemedics AMEX 17.60 0.13
Elements Data Set Datum

7
Scales of Measurement
 Scales of measurement include:

• Nominal
• Ordinal
• Interval
• Ratio
 The scale determines the amount of information
contained in the data.
 The scale indicates the data summarization and
statistical analyses that are most appropriate.

8
 Nominal
• Data are labels or names used to identify an
attribute of the element.
• A nonnumeric label or a numeric code may be
used.

9
 Nominal
• Example:
Students of a university are classified by the
school in which they are enrolled using a
nonnumeric label such as Business,
Humanities, Education, and so on.
Alternatively, a numeric code could be used for
the school variable (e.g. 1 denotes Business, 2
denotes Humanities, 3 denotes Education, and
so on).

10
 Ordinal
• The data have the properties of nominal data and
the order or rank of the data is meaningful.
• A nonnumeric label or a numeric code may be
used.

11
 Ordinal
• Example:
Students of a university are classified by their
class standing using a nonnumeric label such as
Freshman, Sophomore, Junior, or Senior.
Alternatively, a numeric code could be used for
the class standing variable (e.g. 1 denotes
Freshman, 2 denotes Sophomore, and so on).

12
 Interval
• The data have the properties of ordinal data and
the interval between observations is expressed in
terms of a fixed unit of measure.
• Interval data are always numeric.

13
 Interval
• Example:
Melissa has an SAT score of 1205, while Kevin
has an SAT score of 1090. Melissa scored 115
points more than Kevin.

14
 Ratio
• The data have all the properties of interval data
and the ratio of two values is meaningful.
• Variables such as distance, height, weight, and
time use the ratio scale.
• This scale must contain a zero value that indicates
that nothing exists for the variable at the zero
point.

15
 Ratio
• Example:
Melissa’s college record shows 36 credit hours
earned, while Kevin’s record shows 72 credit
hours earned. Kevin has twice as many credit
hours earned as Melissa.

16
Qualitative and Quantitative Data
 Data can be further classified as being qualitative or

quantitative.
 The statistical analysis that is appropriate depends on
whether the data for the variable are qualitative or
quantitative.
 In general, there are more alternatives for statistical
analysis when the data are quantitative.

17
Qualitative Data
 Qualitative data are labels or names used to identify

an attribute of each element.
 Qualitative data use either the nominal or ordinal
scale of measurement.
 Qualitative data can be either numeric or
nonnumeric.
 The statistical analysis for qualitative data are rather
limited.

18
Quantitative Data
 Quantitative data indicate either how many or how

much.
• Quantitative data that measure how many are
discrete.
• Quantitative data that measure how much are
continuous because there is no separation between
the possible values for the data..
 Quantitative data are always numeric.
 Ordinary arithmetic operations are meaningful only
with quantitative data.

19
Cross-Sectional and Time Series Data
 Cross-sectional data are collected at the same or

approximately the same point in time.
• Example: data detailing the number of building
permits issued in June 2000 in each of the counties
of Texas
 Time series data are collected over several time
periods.
• Example: data detailing the number of building
permits issued in Travis County, Texas in each of
the last 36 months

20
Data Sources
 Existing Sources
• Data needed for a particular application might
already exist within a firm. Detailed information
is often kept on customers, suppliers, and
employees for example.
• Substantial amounts of business and economic
data are available from organizations that
specialize in collecting and maintaining data.

21
Data Sources
 Existing Sources
• Government agencies are another important
source of data.
• Data are also available from a variety of industry
associations and special-interest organizations.

22
Data Sources
 Internet
• The Internet has become an important source of
data.
• Most government agencies, like the Bureau of the
Census (www.census.gov), make their data
available through a web site.
• More and more companies are creating web sites
and providing public access to them.
• A number of companies now specialize in making
information available over the Internet.

23
Data Sources
 Statistical Studies
• Statistical studies can be classified as either
experimental or observational.
• In experimental studies the variables of interest
are first identified. Then one or more factors are
controlled so that data can be obtained about how
the factors influence the variables.
• In observational (nonexperimental) studies no
attempt is made to control or influence the
variables of interest; an example is a survey.

24
Data Acquisition Considerations
 Time Requirement
• Searching for information can be time consuming.
• Information might no longer be useful by the time
it is available.
 Cost of Acquisition
• Organizations often charge for information even
when it is not their primary business activity.
 Data Errors
• Using any data that happens to be available or
that were acquired with little care can lead to poor
and misleading information.

25
Descriptive Statistics
 Descriptive statistics are the tabular, graphical, and

numerical methods used to summarize data.

26
Example: Hudson Auto Repair
The manager of Hudson Auto would like to have

a better understanding of the cost of parts used in the
engine tune-ups performed in the shop. She examines
50 customer invoices for tune-ups. The costs of parts,
rounded to the nearest dollar, are listed below.
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73

27
 Tabular Summary (Frequencies and Percent

Frequencies)
Parts Percent
Cost ($) Frequency Frequency
50-59 2 4
60-69 13 26
70-79 16 32
80-89 7 14
90-99 7 14
100-109 5 10
Total 50 100

28
 Graphical Summary (Histogram)

18
16
14
Frequency
12
10
8
6
4
2
Parts
50 60 70 80 90 100 110 Cost ($)

29
 Numerical Descriptive Statistics

• The most common numerical descriptive statistic
is the average (or mean).
• Hudson’s average cost of parts, based on the 50
tune-ups studied, is $79 (found by summing the
50 cost values and then dividing by 50).

30
Statistical Inference
 Statistical inference is the process of using data

obtained from a small group of elements (the sample)
to make estimates and test hypotheses about the
characteristics of a larger group of elements (the
population).

31
 Process of Statistical Inference
1. Population
consists of all 2. A sample of 50
tune-ups. Average engine tune-ups
cost of parts is is examined.
unknown.
4. The value of the 3. The sample data

sample average is used provide a sample
to make an estimate of average cost of
the population average. $79 per tune-up.

32
Chapter 2
Descriptive Statistics:
Tabular and Graphical Methods
 Summarizing Qualitative Data
 Summarizing Quantitative Data
 Exploratory Data Analysis
 Crosstabulations
and Scatter Diagrams

33
Summarizing Qualitative Data
 Frequency Distribution
 Relative Frequency
 Percent Frequency Distribution
 Bar Graph
 Pie Chart

34
Frequency Distribution
 A frequency distribution is a tabular summary of

data showing the frequency (or number) of items in
each of several nonoverlapping classes.
 The objective is to provide insights about the data
that cannot be quickly obtained by looking only at
the original data.

35
Example: Marada Inn
Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20

36
Percent Frequency Distribution
 The percent frequency of a class is the relative

frequency multiplied by 100.
 A percent frequency distribution is a tabular
summary of a set of data showing the percent
frequency for each class.

37
Bar Graph
 A bar graph is a graphical device for depicting

qualitative data.
 On the horizontal axis we specify the labels that are
used for each of the classes.
 A frequency, relative frequency, or percent frequency
scale can be used for the vertical axis.
 Using a bar of fixed width drawn above each class
label, we extend the height appropriately.
 The bars are separated to emphasize the fact that
each class is a separate category.

38
Example: Marada Inn
Guests staying at Marada Inn were asked to rate the

quality of their accommodations as being excellent,
above average, average, below average, or poor. The
ratings provided by a sample of 20 quests are shown
below.
Below Average Average Above Average

Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above AverageAverage
Above Average Average

39
Relative Frequency Distribution
 The relative frequency of a class is the fraction or

proportion of the total number of data items
belonging to the class.
 A relative frequency distribution is a tabular
summary of a set of data showing the relative
frequency for each class.

40
Example: Marada Inn
 Relative Frequency and Percent Frequency

Distributions
Relative Percent
Rating Frequency Frequency
Poor .10 10
Below Average .15 15
Average .25 25
Above Average .45 45
Excellent .05 5
Total 1.00 100

41
Bar Graph
 A bar graph is a graphical device for depicting

qualitative data.
 On the horizontal axis we specify the labels that are
used for each of the classes.
 A frequency, relative frequency, or percent frequency
scale can be used for the vertical axis.
 Using a bar of fixed width drawn above each class
label, we extend the height appropriately.
 The bars are separated to emphasize the fact that
each class is a separate category.

42
Example: Marada Inn
 Bar Graph
9
8
7
Frequency
6
5
4
3
2
1
Rating
Poor Below Average Above Excellent
Average Average

43
Pie Chart
 The pie chart is a commonly used graphical device

for presenting relative frequency distributions for
qualitative data.
 First draw a circle; then use the relative frequencies
to subdivide the circle into sectors that correspond to
the relative frequency for each class.
 Since there are 360 degrees in a circle, a class with a
relative frequency of .25 would consume .25(360) =
90 degrees of the circle.

44
Example: Marada Inn
 Pie Chart
Exc.
Poor
5%
10%
Below
Average
Above
15%
Average
45%
Average
25%
Quality Ratings

45
Example: Marada Inn
 Insights Gained from the Preceding Pie Chart

• One-half of the customers surveyed gave Marada
a quality rating of “above average” or “excellent”
(looking at the left side of the pie). This might
please the manager.
• For each customer who gave an “excellent” rating,
there were two customers who gave a “poor”
rating (looking at the top of the pie). This should
displease the manager.

46
Summarizing Quantitative Data
Distributions
 Dot Plot
 Histogram
 Cumulative Distributions
 Ogive

47
The manager of Hudson Auto would like to get a

better picture of the distribution of costs for engine
tune-up parts. A sample of 50 customer invoices has
been taken and the costs of parts, rounded to the
nearest dollar, are listed below.
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73

48
 Guidelines for Selecting Number of Classes

• Use between 5 and 20 classes.
• Data sets with a larger number of elements
usually require a larger number of classes.
• Smaller data sets usually require fewer classes.

49
 Guidelines for Selecting Width of Classes

• Use classes of equal width.
• Approximate Class Width =
Largest Data Value  Smallest Data Value
Number of Classes

50
If we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5 10
Cost ($) Frequency
50-59 2
60-69 13
70-79 16
80-89 7
90-99 7
100-109 5
Total 50

51

Distributions
Relative Percent
Cost ($) Frequency Frequency
50-59 .04 4
60-69 .26 26
70-79 .32 32
80-89 .14 14
90-99 .14 14
100-109 .10 10
Total 1.00 100

52
 Insights Gained from the Percent Frequency

Distribution
• Only 4% of the parts costs are in the $50-59 class.
• 30% of the parts costs are under $70.
• The greatest percentage (32% or almost one-third)
of the parts costs are in the $70-79 class.
• 10% of the parts costs are $100 or more.

53
Dot Plot
 One of the simplest graphical summaries of data is a

dot plot.
 A horizontal axis shows the range of data values.
 Then each data value is represented by a dot placed
above the axis.

54
 Dot Plot
... ..
.. .. .
.. .
.. .
. .
. . . ..... .......... .. . .. . . ... . .. .
50 60 70 80 90 100 110
Cost ($)

55
Histogram
 Another common graphical presentation of

quantitative data is a histogram.
 The variable of interest is placed on the horizontal
axis.
 A rectangle is drawn above each class interval with
its height corresponding to the interval’s frequency,
relative frequency, or percent frequency.
 Unlike a bar graph, a histogram has no natural
separation between rectangles of adjacent classes.

56
 Histogram
18
16
14
Frequency
12
10
8
6
4
2
Parts
Cost ($)
50 60 70 80 90 100 110
57
Cumulative Distributions
 Cumulative frequency distribution -- shows the

number of items with values less than or equal to the
upper limit of each class.
 Cumulative relative frequency distribution -- shows
the proportion of items with values less than or equal
to the upper limit of each class.
 Cumulative percent frequency distribution -- shows
the percentage of items with values less than or equal
to the upper limit of each class.

58
 Cumulative Distributions
Cumulative Cumulative
Cumulative Relative Percent
Cost ($) Frequency Frequency Frequency
< 59 2 .04 4
< 69 15 .30 30
< 79 31 .62 62
< 89 38 .76 76
< 99 45 .90 90
< 109 50 1.00 100

59
Ogive
 An ogive is a graph of a cumulative distribution.

 The data values are shown on the horizontal axis.
 Shown on the vertical axis are the:
• cumulative frequencies, or
• cumulative relative frequencies, or
• cumulative percent frequencies
 The frequency (one of the above) of each class is
plotted as a point.
 The plotted points are connected by straight lines.

60
 Ogive
• Because the class limits for the parts-cost data are
50-59, 60-69, and so on, there appear to be one-unit
gaps from 59 to 60, 69 to 70, and so on.
• These gaps are eliminated by plotting points
halfway between the class limits.
• Thus, 59.5 is used for the 50-59 class, 69.5 is used
for the 60-69 class, and so on.

61
 Ogive with Cumulative Percent Frequencies

Cumulative Percent Frequency
100
80
60
40
20
Parts
Cost ($)
50 60 70 80 90 100 110

62
Exploratory Data Analysis
 The techniques of exploratory data analysis consist of

simple arithmetic and easy-to-draw pictures that can
be used to summarize data quickly.
 One such technique is the stem-and-leaf display.

63
Stem-and-Leaf Display
 A stem-and-leaf display shows both the rank order

and shape of the distribution of the data.
 It is similar to a histogram on its side, but it has the
advantage of showing the actual data values.
 The first digits of each data item are arranged to the
left of a vertical line.
 To the right of the vertical line we record the last
digit for each item in rank order.
 Each line in the display is referred to as a stem.
 Each digit on a stem is a leaf.

64
 Stem-and-Leaf Display
5 2 7
6 2 2 2 2 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9
8 0 0 2 3 5 8 9
9 1 3 7 7 7 8 9
10 1 4 5 5 9

65
Stretched Stem-and-Leaf Display
 If we believe the original stem-and-leaf display has

condensed the data too much, we can stretch the
display by using two more stems for each leading
digit(s).
 Whenever a stem value is stated twice, the first value
corresponds to leaf values of 0-4, and the second
values corresponds to values of 5-9.

66
 Stretched Stem-and-Leaf Display

5 2
5 7
6 2 2 2 2
6 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4
7 5 5 5 6 7 8 9 9 9
8 0 0 2 3
8 5 8 9
9 1 3
9 7 7 7 8 9
10 1 4
10 5 5 9

67
Stem-and-Leaf Display
 Leaf Units
• A single digit is used to define each leaf.
• In the preceding example, the leaf unit was 1.
• Leaf units may be 100, 10, 1, 0.1, and so on.
• Where the leaf unit is not shown, it is assumed to
equal 1.

68
Example: Leaf Unit = 0.1
If we have data with values such as

8.6 11.7 9.4 9.1 10.2 11.0 8.8
a stem-and-leaf display of these data will be
Leaf Unit = 0.1

8 6 8
9 1 4
10 2
11 0 7

69
Example: Leaf Unit = 10
If we have data with values such as

1806 1717 1974 1791 1682 1910 1838
a stem-and-leaf display of these data will be
Leaf Unit = 10
16 8
17 1 9
18 0 3
19 1 7

70
Crosstabulations and Scatter Diagrams
 Thus far we have focused on methods that are used

to summarize the data for one variable at a time.
 Often a manager is interested in tabular and
graphical methods that will help understand the
relationship between two variables.
 Crosstabulation and a scatter diagram are two
methods for summarizing the data for two (or more)
variables simultaneously.

71
Crosstabulation
 Crosstabulation is a tabular method for summarizing

the data for two variables simultaneously.
 Crosstabulation can be used when:
• One variable is qualitative and the other is
quantitative
• Both variables are qualitative
• Both variables are quantitative
 The left and top margin labels define the classes for
the two variables.

72
Example: Finger Lakes Homes
 Crosstabulation
The number of Finger Lakes homes sold for each
style and price for the past two years is shown below.
Price Home Style
Range Colonial Ranch Split A-Frame Total
< $99,000 18 6 19 12 55
> $99,000 12 14 16 3 45
Total 30 20 35 15 100

73
 Insights Gained from the Preceding Crosstabulation

• The greatest number of homes in the sample (19)
are a split-level style and priced at less than or
equal to $99,000.
• Only three homes in the sample are an A-Frame
style and priced at more than $99,000.

74
Crosstabulation: Row or Column Percentages
 Converting the entries in the table into row

percentages or column percentages can provide
additional insight about the relationship between the
two variables.

75
 Row Percentages
Price Home Style

Range Colonial Ranch Split A-Frame Total
< $99,000 32.73 10.91 34.55 21.82 100
> $99,000 26.67 31.11 35.56 6.67 100
Note: row totals are actually 100.01 due to rounding.

76
 Column Percentages
Price Home Style

Range Colonial Ranch Split A-Frame
< $99,000 60.00 30.00 54.29 80.00
> $99,000 40.00 70.00 45.71 20.00
Total 100 100 100 100

77
Scatter Diagram
 A scatter diagram is a graphical presentation of the

relationship between two quantitative variables.
 One variable is shown on the horizontal axis and the
other variable is shown on the vertical axis.
 The general pattern of the plotted points suggests the
overall relationship between the variables.

78
Scatter Diagram
 A Positive Relationship
y

79
Scatter Diagram
 A Negative Relationship
y

80
Scatter Diagram
 No Apparent Relationship
y

81
Example: Panthers Football Team
 Scatter Diagram
The Panthers football team is interested in
investigating the relationship, if any, between
interceptions made and points scored.
x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 27

82
 Scatter Diagram
Number of Points Scored y
30
25
20
15
10
5
0 x
0 1 2 3
Number of Interceptions
83
 The preceding scatter diagram indicates a positive

relationship between the number of interceptions and
the number of points scored.
 Higher points scored are associated with a higher
number of interceptions.
 The relationship is not perfect; all plotted points in
the scatter diagram are not on a straight line.

84
Tabular and Graphical Procedures
Data
Qualitative Data Quantitative Data
Tabular Graphical Tabular Graphical

Methods Methods Methods Methods
• Frequency • Bar Graph • Frequency • Dot Plot

Distribution • Pie Chart Distribution • Histogram
• Rel. Freq. Dist. • Rel. Freq. Dist. • Ogive
• % Freq. Dist. • Cum. Freq. Dist. • Scatter
• Crosstabulation • Cum. Rel. Freq. Diagram
Distribution
• Stem-and-Leaf
Display
• Crosstabulation

85
Chapter 3
Descriptive Statistics: Numerical Methods
Part A
 Measures of Location
 Measures of Variability

 %
x

86
Measures of Location
 Mean
 Median
 Mode
 Percentiles
 Quartiles

87
Example: Apartment Rents
Given below is a sample of monthly rent values ($)

for one-bedroom apartments. The data is a sample of 70
apartments in a particular city. The data are presented
in ascending order.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

88
Mean
 The mean of a data set is the average of all the data

values.
 If the data are from a sample, the mean is denoted by
x
.  xi
x
n
 If the data are from a population, the mean is
denoted by m (mu).
 xi

N

89
 Mean
 xi 34, 356
x   490.80
n 70
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

90
Median
 The median is the measure of location most often

reported for annual income and property value data.
 A few extremely large incomes or property values
can inflate the mean.

91
Median
 The median of a data set is the value in the middle

when the data items are arranged in ascending order.
 For an odd number of observations, the median is the
middle value.
 For an even number of observations, the median is
the average of the two middle values.

92
 Median
Median = 50th percentile
i = (p/100)n = (50/100)70 = 35.5
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

93
Mode
 The mode of a data set is the value that occurs with

greatest frequency.
 The greatest frequency can occur at two or more
different values.
 If the data have exactly two modes, the data are
bimodal.
 If the data have more than two modes, the data are
multimodal.

94
 Mode
450 occurred most frequently (7 times)
Mode = 450
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

95
Percentiles
 A percentile provides information about how the

data are spread over the interval from the smallest
value to the largest value.
 Admission test scores for colleges and universities
are frequently reported in terms of percentiles.

96
Percentiles
 The pth percentile of a data set is a value such that at

least p percent of the items take on this value or less
and at least (100 - p) percent of the items take on this
value or more.
• Arrange the data in ascending order.
• Compute index i, the position of the pth percentile.
i = (p/100)n
• If i is not an integer, round up. The p th percentile is
the value in the i th position.
• If i is an integer, the p th percentile is the average of
the values in positions i and i +1.

97
 90th Percentile
i = (p/100)n = (90/100)70 = 63
Averaging the 63rd and 64th data values:
90th Percentile = (580 + 590)/2 = 585
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

98
Quartiles
 Quartiles are specific percentiles

 First Quartile = 25th Percentile
 Second Quartile = 50th Percentile = Median
 Third Quartile = 75th Percentile

99
 Third Quartile
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

100
Measures of Variability
 It is often desirable to consider measures of

variability (dispersion), as well as measures of
location.
 For example, in choosing supplier A or supplier B we
might consider not only the average delivery time for
each, but also the variability in delivery time for each.

101
Measures of Variability
 Range
 Interquartile Range
 Variance
 Standard Deviation
 Coefficient of Variation

102
Range
 The range of a data set is the difference between the

largest and smallest data values.
 It is the simplest measure of variability.
 It is very sensitive to the smallest and largest data
values.

103
 Range
Range = largest value - smallest value
Range = 615 - 425 = 190
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

104
Interquartile Range
 The interquartile range of a data set is the difference

between the third quartile and the first quartile.
 It is the range for the middle 50% of the data.
 It overcomes the sensitivity to extreme data values.

105
 Interquartile Range
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

106
Variance
 The variance is a measure of variability that utilizes

all the data.
 It is based on the difference between the value of
each observation (xi) and the mean (x for a sample, m
for a population).

107
Variance
 The variance is the average of the squared differences

between each data value and the mean.
 If the data set is a sample, the variance is denoted by
s2.
2
2  ( xi  x )
s 
n 1
 If the data set is a population, the variance is denoted
by  2.
2
2  ( xi   )
 
N

108
Standard Deviation
 The standard deviation of a data set is the positive

square root of the variance.
 It is measured in the same units as the data, making it
more easily comparable, than the variance, to the
mean.
 If the data set is a sample, the standard deviation is
denoted s.
2
s s
 If the data set is a population, the standard deviation
is denoted  (sigma).
 2

109
Coefficient of Variation
 The coefficient of variation indicates how large the

standard deviation is in relation to the mean.
 If the data set is a sample, the coefficient of variation
is computed as follows:
s
(100)
x
 If the data set is a population, the coefficient of
variation is computed as follows:

(100)


110
 Variance
2  ( xi  x ) 2
s   2 , 996.16
n 1
s  s 2  2996. 47  54. 74
 Coefficient of Variation
s 54. 74
 100   100  11.15
x 490.80

111
Chapter 3
Descriptive Statistics: Numerical Methods
Part B
 Measures of Relative Location and Detecting Outliers
 Exploratory Data Analysis
 Measures of Association Between Two Variables

 The Weighted Mean and
Working with Grouped Data
%
x
112
Measures of Relative Location
and Detecting Outliers
 z-Scores
 Chebyshev’s Theorem
 Empirical Rule
 Detecting Outliers

113
z-Scores
 The z-score is often called the standardized value.

 It denotes the number of standard deviations a data
value xi is from the mean.
xi  x
zi 
s
 A data value less than the sample mean will have a z-
score less than zero.
 A data value greater than the sample mean will have
a z-score greater than zero.
 A data value equal to the sample mean will have a z-
score of zero.

114
 z-Score of Smallest Value (425)

xi  x 425  490.80
z   1. 20
s 54. 74
Standardized Values for Apartment Rents
-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

115
Chebyshev’s Theorem
At least (1 - 1/z2) of the items in any data set will be

within z standard deviations of the mean, where z is
any value greater than 1.
• At least 75% of the items must be within
z = 2 standard deviations of the mean.

116
 Chebyshev’s Theorem
Let z = 1.5 with x= 490.80 and s = 54.74

At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56%
of the rent values must be between
x - z(s) = 490.80 - 1.5(54.74) = 409
and
x+ z(s) = 490.80 + 1.5(54.74) = 573

117
 Chebyshev’s Theorem (continued)

Actually, 86% of the rent values
are between 409 and 573.
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

118
Empirical Rule
For data having a bell-shaped distribution:
• Approximately 68% of the data values will be

within one standard deviation of the mean.

119
Empirical Rule
• Approximately 95% of the data values will be

within two standard deviations of the mean.

120
Empirical Rule
• Almost all (99.7%) of the items will be

within three standard deviations of the mean.

121
 Empirical Rule
Interval % in Interval
Within +/- 1s 436.06 to 545.54 48/70 = 69%
Within +/- 2s 381.32 to 600.28 68/70 = 97%
Within +/- 3s 326.58 to 655.02 70/70 = 100%
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

122
Detecting Outliers
 An outlier is an unusually small or unusually large

value in a data set.
 A data value with a z-score less than -3 or greater
than +3 might be considered an outlier.
 It might be:
• an incorrectly recorded data value
• a data value that was incorrectly included in the
data set
• a correctly recorded data value that belongs in the
data set

123
 Detecting Outliers
The most extreme z-scores are -1.20 and 2.27.
Using |z| > 3 as the criterion for an outlier,
there are no outliers in this data set.
Standardized Values for Apartment Rents
-1.20 -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93
-0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75
-0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.47
-0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.20
-0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.35
0.35 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.45
1.54 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 2.27

124
Exploratory Data Analysis
 Five-Number Summary
 Box Plot

125
Five-Number Summary
 Smallest Value
 First Quartile
 Median
 Third Quartile
 Largest Value

126
 Five-Number Summary
Lowest Value = 425 First Quartile = 450
Median = 475
Third Quartile = 525 Largest Value = 615
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

127
Box Plot
 A box is drawn with its ends located at the first and

third quartiles.
 A vertical line is drawn in the box at the location of
the median.
 Limits are located (not drawn) using the interquartile
range (IQR).
• The lower limit is located 1.5(IQR) below Q1.
• The upper limit is located 1.5(IQR) above Q3.
• Data outside these limits are considered outliers.
… continued

128
Box Plot (Continued)
 Whiskers (dashed lines) are drawn from the ends of

the box to the smallest and largest data values inside
the limits.
 The locations of each outlier is shown with the
symbol * .

129
 Box Plot
Lower Limit: Q1 - 1.5(IQR) = 450 - 1.5(75) = 337.5
Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5
There are no outliers.
37 40 42 45 47 50 52 550 575 600 625

5 0 5 0 5 0 5

130
Measures of Association
Between Two Variables
 Covariance
 Correlation Coefficient

131
Covariance
 The covariance is a measure of the linear association

between two variables.
 Positive values indicate a positive relationship.
 Negative values indicate a negative relationship.

132
Covariance
 If the data sets are samples, the covariance is denoted

by sxy.
 ( xi  x )( yi  y )
sxy 
n 1
 If the data sets are populations, the covariance is
denoted by  xy .
 ( xi   x )( yi   y )
 xy 
N

133
Correlation Coefficient
 The coefficient can take on values between -1 and +1.

 Values near -1 indicate a strong negative linear
relationship.
 Values near +1 indicate a strong positive linear
relationship.
 If the data sets are samples, the coefficient is rxy.
sxy
rxy 
sx s y
 xy
 If the data sets are populations, the coefficient is .
 xy
 xy 
 x y

134
The Weighted Mean and
Working with Grouped Data
 Weighted Mean
 Mean for Grouped Data
 Variance for Grouped Data
 Standard Deviation for Grouped Data

135
Weighted Mean
 When the mean is computed by giving each data

value a weight that reflects its importance, it is
referred to as a weighted mean.
 In the computation of a grade point average (GPA),
the weights are the number of credit hours earned for
each grade.
 When data values vary in importance, the analyst
must choose the weight that best reflects the
importance of each value.

136
Weighted Mean
x =  wi xi
 wi
where:
xi = value of observation i
wi = weight for observation i

137
Grouped Data
 The weighted mean computation can be used to

obtain approximations of the mean, variance, and
standard deviation for the grouped data.
 To compute the weighted mean, we treat the
midpoint of each class as though it were the mean of
all items in the class.
 We compute a weighted mean of the class midpoints
using the class frequencies as weights.
 Similarly, in computing the variance and standard
deviation, the class frequencies are used as weights.

138
Mean for Grouped Data
 Sample Data
x
 fM
i i
f i
 Population Data
  fM
i i
N
where:
fi = frequency of class i
Mi = midpoint of class i

139
Given below is the previous sample of monthly rents

for one-bedroom apartments presented here as grouped
data in the form of a frequency distribution.
Rent ($) Frequency
420-439 8
440-459 17
460-479 12
480-499 8
500-519 7
520-539 4
540-559 2
560-579 4
580-599 2
600-619 6
140
 Mean for Grouped Data

Rent ($) fi Mi f iMi
420-439 8 429.5 3436.0 34, 525
x  493. 21
440-459 17 449.5 7641.5 70
460-479 12 469.5This approximation
5634.0
480-499
differs by8 $2.41489.5
from 3916.0
500-519 7 509.5 3566.5
520-539 4 529.5the actual
2118.0sample
mean of $490.80.
540-559 2 549.5 1099.0
560-579 4 569.5 2278.0
580-599 2 589.5 1179.0
600-619 6 609.5 3657.0
Total 70 34525.0

141
Variance for Grouped Data
 Sample Data
2  f i ( Mi  x ) 2
s 
n 1
 Population Data
2
 f i ( M i   )
2 
N

142
 Variance for Grouped Data

s2  3, 017.89
 Standard Deviation for Grouped Data
s  3, 017.89  54. 94
This approximation differs by only $.20
from the actual standard deviation of $54.74.

143
Chapter 7
Sampling and Sampling Distributions
 Simple Random Sampling
 Point Estimation
 Introduction to Sampling Distributions
 Sampling Distribution of x
 Sampling Distribution of p n = 100
 Sampling Methods
n = 30

144
Statistical Inference
 The purpose of statistical inference is to obtain

information about a population from information
contained in a sample.
 A population is the set of all the elements of interest.
 A sample is a subset of the population.
 The sample results provide only estimates of the
values of the population characteristics.
 A parameter is a numerical characteristic of a
population.
 With proper sampling methods, the sample results
will provide “good” estimates of the population
characteristics.

145
Simple Random Sampling
 Finite Population
• A simple random sample from a finite population
of size N is a sample selected such that each
possible sample of size n has the same probability
of being selected.
• Replacing each sampled element before selecting
subsequent elements is called sampling with
replacement.

146
 Finite Population
• Sampling without replacement is the procedure
used most often.
• In large sampling projects, computer-generated
random numbers are often used to automate the
sample selection process.

147
 Infinite Population
• A simple random sample from an infinite
population is a sample selected such that the
following conditions are satisfied.
• Each element selected comes from the same
population.
• Each element is selected independently.

148
 Infinite Population
• The population is usually considered infinite if it
involves an ongoing process that makes listing or
counting every element impossible.
• The random number selection procedure cannot
be used for infinite populations.

149
Point Estimation
 In point estimation we use the data from the sample

to compute a value of a sample statistic that serves as
an estimate of a population parameter.
 We refer to x as the point estimator of the population
mean .
 s is the point estimator of the population standard
deviation .
 p is the point estimator of the population proportion
p.

150
Point Estimation
 When the expected value of a point estimator is equal

to the population parameter, the point estimator is
said to be unbiased.

151
Sampling Error
 The absolute difference between an unbiased point

estimate and the corresponding population
parameter is called the sampling error.
 Sampling error is the result of using a subset of the
population (the sample), and not the entire
population to develop estimates.
 The sampling errors are:
| x for
 | sample mean
|s   | for sample standard deviation
| p for
p| sample proportion

152
Example: St. Andrew’s
St. Andrew’s College receives 900 applications

annually from prospective students. The application
forms contain a variety of information including the
individual’s scholastic aptitude test (SAT) score and
whether or not the individual desires on-campus
housing.

153
The director of admissions would like to know the

following information:
• the average SAT score for the applicants, and
• the proportion of applicants that want to live on
campus.

154
We will now look at three alternatives for obtaining

the desired information.
• Conducting a census of the entire 900 applicants
• Selecting a sample of 30 applicants, using a
random number table
• Selecting a sample of 30 applicants, using
computer-generated random numbers

155
 Taking a Census of the 900 Applicants

• SAT Scores
• Population Mean

 x i
 990
900
• Population Standard Deviation

 i
( x   )2
 80
900

156
 Taking a Census of the 900 Applicants

• Applicants Wanting On-Campus Housing
• Population Proportion
648
p  .72
900

157
 Take a Sample of 30 Applicants

Using a Random Number Table
Since the finite population has 900 elements, we
will need 3-digit random numbers to randomly select
applicants numbered from 1 to 900.
We will use the last three digits of the 5-digit
random numbers in the third column of the
textbook’s random number table.

158

Using a Random Number Table
The numbers we draw will be the numbers of the
applicants we will sample unless
• the random number is greater than 900 or
• the random number has already been used.
We will continue to draw random numbers until we
have selected 30 applicants for our sample.

159
 Use of Random Numbers for Sampling

3-Digit Applicant
Random Number Included in Sample
744 No. 744
436 No. 436
865 No. 865
790 No. 790
835 No. 835
902 Number exceeds 900
190 No. 190
436 Number already used
etc. etc.

160
 Sample Data
Random
No. Number Applicant SAT Score On-Campus
1 744 Connie Reyman 1025 Yes
2 436 William Fox 950 Yes
3 865 Fabian Avante 1090 No
4 790 Eric Paxton 1120 Yes
5 835 Winona Wheeler 1015 No
. . . . .
30 685 Kevin Cossack 965 No

161

Using Computer-Generated Random Numbers
• Excel provides a function for generating random
numbers in its worksheet.
• 900 random numbers are generated, one for each
applicant in the population.
• Then we choose the 30 applicants corresponding
to the 30 smallest random numbers as our sample.
• Each of the 900 applicants have the same
probability of being included.

162
Using Excel to Select
a Simple Random Sample
 Formula Worksheet
A B C D
Applicant SAT On-Campus Random
1 Number Score Housing Number
2 1 1008 Yes =RAND()
3 2 1025 No =RAND()
4 3 952 Yes =RAND()
5 4 1090 Yes =RAND()
6 5 1127 Yes =RAND()
7 6 1015 No =RAND()
8 7 965 Yes =RAND()
9 8 1161 No =RAND()
Note: Rows 10-901 are not shown.

163
 Value Worksheet
A B C D
2 1 1008 Yes 0.41327
3 2 1025 No 0.79514
4 3 952 Yes 0.66237
5 4 1090 Yes 0.00234
6 5 1127 Yes 0.71205
7 6 1015 No 0.18037
8 7 965 Yes 0.71607
9 8 1161 No 0.90512

164
 Value Worksheet (Sorted)
A B C D
2 12 1107 No 0.00027
3 773 1043 Yes 0.00192
4 408 991 Yes 0.00303
5 58 1008 No 0.00481
6 116 1127 Yes 0.00538
7 185 982 Yes 0.00583
8 510 1163 Yes 0.00649
9 394 1008 No 0.00667

165
 Point Estimates
• x as Point Estimator of 
x
 x

29,910
i
 997
30 30
• s as Point Estimator of 
s
 (x i  x )2

163,996
 75.2
29 29
• p as Point Estimator of p
p  20 30  .68

166
 Point Estimates
Note: Different random numbers would have
identified a different sample which would have
resulted in different point estimates.

167
Sampling Distribution of x
 Process of Statistical Inference
Population A simple random sample

with mean of n elements is selected
m=? from the population.
The value xof is used to The sample data

make inferences about provide a value for
the value of m. x .
the sample mean

168
 The sampling distribution of x is the probability

distribution of all possible values of the sample
mean x .
 Expected Value of x
E( ) =  x
where:
 = the population mean

169
 Standard Deviation of x
Finite Population Infinite Population
 N n 
x  ( ) x 
n N 1 n
• A finite population is treated as being
infinite if n/N < .05.
• ( N  n ) / ( N  1) is the finite correction factor.
•  x is referred to as the standard error of the mean.

170
 If we use a large (n > 30) simple random sample, the

central limit theorem enables us to conclude that the
sampling distribution of x can be approximated by a
normal probability distribution.
 When the simple random sample is small (n < 30), the
sampling distribution of x can be considered normal
only if we assume the population has a normal
probability distribution.

171
 Sampling Distribution of x for the SAT Scores
 80
x    14.6
n 30
x
E ( x )    990

172

What is the probability that a simple random
sample of 30 applicants will provide an estimate of
the population mean SAT score that is within plus or
minus 10 of the actual population mean  ?
In other words, what is the probability that xwill
be between 980 and 1000?

173
Sampling
distribution
Area = ? of x
x
980 990 1000

174

Using the standard normal probability table with
z = 10/14.6= .68, we have area = (.2518)(2) = .5036
The probability is .5036 that the sample mean will be
within +/-10 of the actual population mean.

175
Sampling
Area = 2(.2518) = .5036
distribution
of x
x
980 990 1000

176
Sampling Distribution of p
 The sampling distribution of p is the probability

distribution of all possible values of the sample
proportion p.
 Expected Value of p
E ( p)  p
where:
p = the population proportion

177
 Standard Deviation of p
Finite Population Infinite Population
p (1  p ) N  n p (1  p )
p  p 
n N 1 n
•  p is referred to as the standard error of the
proportion.

178
 The sampling distribution of p can be approximated

by a normal probability distribution whenever the
sample size is large.
 The sample size is considered large whenever these
conditions are satisfied:
np > 5
and
n(1 – p) > 5

179
 For values of p near .50, sample sizes as small as 10

permit a normal approximation.
 With very small (approaching 0) or large
(approaching 1) values of p, much larger samples are
needed.

180
 Sampling Distribution of p for In-State Residents

The normal probability distribution is an
acceptable approximation because:
np = 30(.72) = 21.6 > 5
and
n(1 - p) = 30(.28) = 8.4 > 5.

181
.72(1  .72)
p   .082
30
E( p )  .72

182

What is the probability that a simple random
sample of 30 applicants will provide an estimate of
the population proportion of applicants desiring on-
campus housing that is within plus or minus .05 of
the actual population proportion?
In other words, what is the probability that p
will be between .67 and .77?

183
Sampling
distribution
of p
Area = ?
p
0.67 0.72 0.77

184

For z = .05/.082 = .61, the area = (.2291)(2) = .4582.
The probability is .4582 that the sample proportion
will be within +/-.05 of the actual population
proportion.

185
Sampling
Area = 2(.2291) = .4582 distribution
of p
p
0.67 0.72 0.77

186
Sampling Methods
 Stratified Random Sampling

 Cluster Sampling
 Systematic Sampling
 Convenience Sampling
 Judgment Sampling

187
Stratified Random Sampling
 The population is first divided into groups of

elements called strata.
 Each element in the population belongs to one and
only one stratum.
 Best results are obtained when the elements within
each stratum are as much alike as possible (i.e.
homogeneous group).
 A simple random sample is taken from each stratum.
 Formulas are available for combining the stratum
sample results into one population parameter
estimate.

188
Stratified Random Sampling
 Advantage: If strata are homogeneous, this method

is as “precise” as simple random sampling but with a
smaller total sample size.
 Example: The basis for forming the strata might be
department, location, age, industry type, etc.

189
Cluster Sampling
 The population is first divided into separate groups

of elements called clusters.
 Ideally, each cluster is a representative small-scale
version of the population (i.e. heterogeneous group).
 A simple random sample of the clusters is then taken.
 All elements within each sampled (chosen) cluster
form the sample.
… continued

190
Cluster Sampling
 Advantage: The close proximity of elements can be

cost effective (I.e. many sample observations can be
obtained in a short time).
 Disadvantage: This method generally requires a
larger total sample size than simple or stratified
random sampling.
 Example: A primary application is area sampling,
where clusters are city blocks or other well-defined
areas.

191
Systematic Sampling
 If a sample size of n is desired from a population

containing N elements, we might sample one element
for every n/N elements in the population.
 We randomly select one of the first n/N elements
from the population list.
 We then select every n/Nth element that follows in
the population list.
 This method has the properties of a simple random
sample, especially if the list of the population
elements is a random ordering.
… continued

192
Systematic Sampling
 Advantage: The sample usually will be easier to

identify than it would be if simple random sampling
were used.
 Example: Selecting every 100th listing in a telephone
book after the first randomly selected listing.

193
Convenience Sampling
 It is a nonprobability sampling technique. Items are

included in the sample without known probabilities
of being selected.
 The sample is identified primarily by convenience.
 Advantage: Sample selection and data collection are
relatively easy.
 Disadvantage: It is impossible to determine how
representative of the population the sample is.
 Example: A professor conducting research might use
student volunteers to constitute a sample.

194
Judgment Sampling
 The person most knowledgeable on the subject of the

study selects elements of the population that he or
she feels are most representative of the population.
 It is a nonprobability sampling technique.
 Advantage: It is a relatively easy way of selecting a
sample.
 Disadvantage: The quality of the sample results
depends on the judgment of the person selecting the
sample.
 Example: A reporter might sample three or four
senators, judging them as reflecting the general
opinion of the senate.

195
Chapter 9
Hypothesis Testing
 Developing Null and Alternative Hypotheses
 Type I and Type II Errors
 One-Tailed Tests About a Population Mean:
Large-Sample Case
 Two-Tailed Tests About a Population Mean:
Large-Sample Case
 Tests About a Population Mean:
Small-Sample Case
 Tests About a Population Proportion

196
Developing Null and Alternative Hypotheses
 Hypothesis testing can be used to determine whether

a statement about the value of a population
parameter should or should not be rejected.
 The null hypothesis, denoted by H0 , is a tentative
assumption about a population parameter.
 The alternative hypothesis, denoted by Ha, is the
opposite of what is stated in the null hypothesis.

197
 Testing Research Hypotheses

• Hypothesis testing is proof by contradiction.
• The research hypothesis should be expressed as
the alternative hypothesis.
• The conclusion that the research hypothesis is true
comes from sample data that contradict the null
hypothesis.

198
 Testing the Validity of a Claim

• Manufacturers’ claims are usually given the
benefit of the doubt and stated as the null
hypothesis.
• The conclusion that the claim is false comes from
sample data that contradict the null hypothesis.

199
 Testing in Decision-Making Situations

• A decision maker might have to choose between
two courses of action, one associated with the null
hypothesis and another associated with the
alternative hypothesis.
• Example: Accepting a shipment of goods from a
supplier or returning the shipment of goods to the
supplier.

200
A Summary of Forms for Null and Alternative
Hypotheses about a Population Mean
 The equality part of the hypotheses always appears
in the null hypothesis.
 In general, a hypothesis test about the value of a
population mean  must take one of the following
three forms (where 0 is the hypothesized value of
the population mean).
H0:  > 0 H0:  < 0 H0:  = 0
Ha:  < 0 Ha:  > 0 H :  
a 0
One-tailed One-tailed Two-tailed

201
Example: Metro EMS
 Null and Alternative Hypotheses

A major west coast city provides one of the most
comprehensive emergency medical services in the
world. Operating in a multiple hospital system with
approximately 20 mobile medical units, the service
goal is to respond to medical emergencies with a
mean time of 12 minutes or less.
The director of medical services wants to
formulate a hypothesis test that could use a sample of
emergency response times to determine whether or
not the service goal of 12 minutes or less is being
achieved.

202
Example: Metro EMS
 Null and Alternative Hypotheses

Hypotheses Conclusion and Action
H0:  The emergency service is meeting
the response goal; no follow-up
action is necessary.
Ha: The emergency service is not
meeting the response goal;
appropriate follow-up action is
necessary.
Where:  = mean response time for the population
of medical emergency requests.

203
Type I and Type II Errors
 Since hypothesis tests are based on sample data, we

must allow for the possibility of errors.
 A Type I error is rejecting H0 when it is true.
 The person conducting the hypothesis test specifies
the maximum allowable probability of making a
Type I error, denoted by  and called the level of
significance.

204
Type I and Type II Errors
 A Type II error is accepting H0 when it is false.

 Generally, we cannot control for the probability of
making a Type II error, denoted by .
 Statistician avoids the risk of making a Type II error
by using “do not reject H0” and not “accept H0”.

205
Example: Metro EMS
 Type I and Type II Errors
Population Condition
H0 True Ha True
Conclusion ( ) ( )
Accept H0 Correct Type II

(Conclude  Conclusion Error
Reject H0 Type I Correct

(Conclude  rror Conclusion

206
Using the Test Statistic
 The test statistic z has a standard normal probability

distribution.
 We can use the standard normal probability
distribution table to find the z-value with an area of a
in the lower (or upper) tail of the distribution.
 The value of the test statistic that established the
boundary of the rejection region is called the critical
value for the test.
 The rejection rule is:
• Lower tail: Reject H if z < z .
0 
• Upper tail: Reject H if z > z .
0 

207
Using the p-Value
 The p-value is the probability of obtaining a sample

result that is at least as unlikely as what is observed.
 The p-value can be used to make the decision in a
hypothesis test by noting that:
• if the p-value is less than the level of significance
, the value of the test statistic is in the rejection
region.
• if the p-value is greater than or equal to , the
value of the test statistic is not in the rejection
region.
 Reject H0 if the p-value < .

208
Steps of Hypothesis Testing
1. Determine the null and alternative hypotheses.

2. Specify the level of significance .
3. Select the test statistic that will be used to test the
hypothesis.
Using the Test Statistic
4. Use to determine the critical value for the test
statistic and state the rejection rule for H0.
5. Collect the sample data and compute the value of
the test statistic.
6. Use the value of the test statistic and the rejection
rule to determine whether to reject H0.

209
Steps of Hypothesis Testing
Using the p-Value

4. Collect the sample data and compute the value of
the test statistic.
5. Use the value of the test statistic to compute the p-
value.
6. Reject H0 if p-value < a.

210
One-Tailed Tests about a Population Mean:
Large-Sample Case (n > 30)
 Hypotheses
H0:   or H0: 
Ha:  Ha:
 Test Statistic
 Known  Unknown
x  0 x  0
z z
/ n s/ n
 Rejection Rule
Reject H0 if z > zReject H0 if z < -z

211
Example: Metro EMS
 One-Tailed Test about a Population Mean: Large n

Let  = P(Type I Error) = .05
Sampling distribution
of x (assuming H0 is
true and  = 12) Reject H0
Do Not Reject H0 
1.645 x
x
12 c
(Critical value)

212
Example: Metro EMS

Let n = 40, x = 13.25 minutes, s = 3.2 minutes
(The sample standard deviation s can be used to
estimate the population standard deviation .)
x   13. 25  12
z   2. 47
 / n 3. 2 / 40
Since 2.47 > 1.645, we reject H0.
Conclusion: We are 95% confident that Metro EMS
is not meeting the response goal of 12 minutes;
appropriate action should be taken to improve
service.

213
Example: Metro EMS

Conclusion: xWe are 95% confident that Metro EMS
is not meeting the response goal of 12 minutes;
appropriate action should be taken to improve
service.

214
Example: Metro EMS
 Using the p-value to Test the Hypothesis

Recall that z = 2.47 for x = 13.25. Then p-value = .0068.
Since p-value < , that is .0068 < .05, we reject H0.
Reject H0
Do Not Reject H0 p-value
z
0 1.645 2.47

215
Two-Tailed Tests about a Population Mean:
Large-Sample Case (n > 30)
 Hypotheses
H0: = 
Ha:  
 Test Statistic  Known  Unknown

x  0 x  0
z z
/ n s/ n
 Rejection Rule
Reject H0 if |z| > z

216
Example: Glow Toothpaste
 Two-Tailed Tests about a Population Mean: Large n

The production line for Glow toothpaste is
designed to fill tubes of toothpaste with a mean
weight of 6 ounces.
Periodically, a sample of 30 tubes will be selected
in order to check the filling process. Quality
assurance procedures call for the continuation of the
filling process if the sample results are consistent with
the assumption that the mean filling weight for the
population of toothpaste tubes is 6 ounces; otherwise
the filling process will be stopped and adjusted.

217
 Two-Tailed Tests about a Population Mean: Large n

A hypothesis test about the population mean can
be used to help determine when the filling process
should continue operating and when it should be
stopped and corrected.
• Hypotheses
H0:  
H :  
 a
• Rejection Rule
ssuming a .05 level of significance,
Reject H0 if z < -1.96 or if z > 1.96

218
 Two-Tailed Test about a Population Mean: Large n

Sampling distribution
of x (assuming H0 is
true and  = 6)
Reject H0 Do Not Reject H0 Reject H0

 
z
-1.96 0 1.96

219

Assume that a sample of 30 toothpaste tubes
provides a sample mean of 6.1 ounces and standard
deviation of 0.2 ounces.
Let n = 30, x= 6.1 ounces, s = .2 ounces
x   0 6.1  6
z   2.74
s / n .2 / 30

220

Conclusion: We are 95% confident that the mean
filling weight of the toothpaste tubes is not 6
ounces. The filling process should be stopped and
the filling mechanism adjusted.
x

221
 Using the p-Value for a Two-Tailed Hypothesis Test

Suppose we define the p-value for a two-tailed test
as double the area found in the tail of the distribution.
With z = 2.74, the standard normal probability
table shows there is a .5000 - .4969 = .0031 probability
of a difference larger than .1 in the upper tail of the
distribution.
Considering the same probability of a larger
difference in the lower tail of the distribution, we have
p-value = 2(.0031) = .0062
The p-value .0062 is less than  = .05, so H0 is rejected.

222
Confidence Interval Approach to a
Two-Tailed Test about a Population Mean
 Select a simple random sample from the population
and use the value of the sample mean x to develop
the confidence interval for the population mean .
 If the confidence interval contains the hypothesized
value 0, do not reject H0. Otherwise, reject H0.

223
 Confidence Interval Approach to a Two-Tailed

Hypothesis Test
The 95% confidence interval for  is

x  z / 2  6.1  1. 96(. 2 30 )  6.1. 0716
n
or 6.0284 to 6.1716
Since the hypothesized value for the population
mean, 0 = 6, is not in this interval, the hypothesis-
testing conclusion is that the null hypothesis,
H0:  = 6, can be rejected.

224
Tests about a Population Mean:
Small-Sample Case (n < 30)
 Test Statistic
 Known  Unknown
x  0 x  0
t t
/ n s/ n
This test statistic has a t distribution with n - 1

degrees of freedom.

225
Tests about a Population Mean:
Small-Sample Case (n < 30)
 Rejection Rule
H0:  Reject H0 if t > t

H0:  Reject H0 if t < -t
H0:  Reject H0 if |t| > t

226
p -Values and the t Distribution
 The format of the t distribution table provided in

most statistics textbooks does not have sufficient
detail to determine the exact p-value for a hypothesis
test.
 However, we can still use the t distribution table to
identify a range for the p-value.
 An advantage of computer software packages is that
the computer output will provide the p-value for the
t distribution.

227
Example: Highway Patrol
 One-Tailed Test about a Population Mean: Small n

A State Highway Patrol periodically samples
vehicle speeds at various locations on a particular
roadway. The sample of vehicle speeds is used to
test the hypothesis
H0: m < 65.
The locations where H0 is rejected are deemed the
best locations for radar traps.
At Location F, a sample of 16 vehicles shows a
mean speed of 68.2 mph with a standard deviation of
3.8 mph. Use an a = .05 to test the hypothesis.

228
Reject H0
Do Not Reject H0 
t
0 1.753
(Critical value)

229

Let n = 16, x = 68.2 mph, s = 3.8 mph
a = .05, d.f. = 16-1 = 15, ta = 1.753
x   0 68.2  65
t   3.37
s / n 3.8 / 16
Conclusion: We are 95% confident that the mean
speed of vehicles at Location F is greater than 65
mph. Location F is a good candidate for a radar trap.

230
Summary of Test Statistics to be Used in a
Hypothesis Test about a Population Mean
Yes No
n > 30 ?
No
s known ? Popul.
Yes
approx.
Yes normal
Use s to
estimate s No ?
s known ?
No
Use s to
Yes
estimate s
x  x  x  x  Increase n
z z z t
/ n s/ n / n s/ n to > 30

231
A Summary of Forms for Null and Alternative
Hypotheses about a Population Proportion
 The equality part of the hypotheses always appears
in the null hypothesis.
 In general, a hypothesis test about the value of a
population proportion p must take one of the
following three forms (where p0 is the hypothesized
value of the population proportion).
H0: p > p0 H0: p < p0 H0: p = p0

Ha: p < p0 Ha: p > p0 Ha: p p0
One-tailed One-tailed Two-tailed

232
Tests about a Population Proportion:
Large-Sample Case (np > 5 and n(1 - p) > 5)
 Test Statistic
p  p0
z
p
where:
p0 (1  p0 )
p 
n

233
Tests about a Population Proportion:
Large-Sample Case (np > 5 and n(1 - p) > 5)
 Rejection Rule
H0: pp Reject H0 if z > z

H0: pp Reject H0 if z < -z
H0: pp Reject H0 if |z| > z

234
Example: NSC
 Two-Tailed Test about a Population Proportion:

Large n
For a Christmas and New Year’s week, the
National Safety Council estimated that 500 people
would be killed and 25,000 injured on the nation’s
roads. The NSC claimed that 50% of the accidents
would be caused by drunk driving.
A sample of 120 accidents showed that 67 were
caused by drunk driving. Use these data to test the
NSC’s claim with a = 0.05.

235
Example: NSC

Large n
• Hypothesis
H0: p = .5
H : p .5 
a
• Test Statistic
p0 (1  p0 ) .5(1  .5)
p    .045644
n 120
p  p0 (67 /120)  .5
z   1.278
p .045644

236
Example: NSC

Large n
• Rejection Rule
Reject H0 if z < -1.96 or z > 1.96
• Conclusion
Do not reject H0.
For z = 1.278, the p-value is .201. If we reject
H0, we exceed the maximum allowed risk of
committing a Type I error (p-value > .050).

237
Chapter 10
Comparisons Involving Means
 Estimation of the Difference between the Means of
Two Populations: Independent Samples
 Hypothesis Tests about the Difference between the
Means of Two Populations: Independent Samples
 Inferences about the Difference between the Means
of Two Populations: Matched Samples
 Introduction to Analysis of Variance (ANOVA)
 ANOVA: Testing for the Equality of k Population
Means
2 ?
1 =
ANOVA
238
Estimation of the Difference Between the Means
of Two Populations: Independent Samples
 Point Estimator of the Difference between the Means
of Two Populations
 Sampling Distribution x1  x2
 Interval Estimate of Large-Sample Case
 Interval Estimate of Small-Sample Case

239
Point Estimator of the Difference Between
the Means of Two Populations
 Let 1 equal the mean of population 1 and 2 equal
the mean of population 2.
 The difference between the two population means is
1 - 2.
 To estimate 1 - 2, we will select a simple random
sample of size n1 from population 1 and a simple
random sample of size n2 from population 2.
x1 x2
 Let equal the mean of sample 1 and equal the
mean of sample 2.
 The point estimator of the difference between the
x1  x2
means of the populations 1 and 2 is .

240
Sampling Distribution of x1  x2
 Properties of the Sampling Distribution of x1  x2

• Expected Value
E ( x1  x2 )  1   2

241
Sampling Distribution of x1  x2
 Properties of the Sampling Distribution of x1  x2

• Standard Deviation
12  22
 x1  x2  
n1 n2
where: 1 = standard deviation of population 1

2 = standard deviation of population 2
n1 = sample size from population 1
n2 = sample size from population 2

242
Interval Estimate of 1 - 2:
Large-Sample Case (n1 > 30 and n2 > 30)
 Interval Estimate with 1 and 2 Known
x1  x2  z / 2  x1  x2
where:
1 -  is the confidence coefficient

243
Large-Sample Case (n1 > 30 and n2 > 30)
 Interval Estimate with 1 and 2 Unknown
x1  x2  z / 2 sx1  x2
where:
s12 s22
sx1  x2  
n1 n2

244
Example: Par, Inc.
 Interval Estimate of 1 - 2: Large-Sample Case

Par, Inc. is a manufacturer of golf equipment and
has developed a new golf ball that has been designed
to provide “extra distance.” In a test of driving
distance using a mechanical driving device, a sample of
Par golf balls was compared with a sample of golf balls
made by Rap, Ltd., a competitor.
The sample statistics appear on the next slide.

245
Example: Par, Inc.
 Interval Estimate of 1 - 2: Large-Sample Case

• Sample Statistics
Sample #1 Sample #2
Par, Inc. Rap, Ltd.
Sample Size n1 = 120 balls n2 = 80 balls
Mean = 235xyards
1 = 218xyards
2
Standard Dev. s1 = 15 yards s2 = 20 yards

246
Example: Par, Inc.
 Point Estimate of the Difference Between Two

Population Means
1 = mean distance for the population of
Par, Inc. golf balls
2 = mean distance for the population of
Rap, Ltd. golf balls
Point estimate of 1 - 2 = x1  x2 = 235 - 218 = 17 yards.

247
Point Estimator of the Difference Between
the Means of Two Populations
Population 1 Population 2
Par, Inc. Golf Balls Rap, Ltd. Golf Balls
m1 = mean driving m2 = mean driving
distance of Par distance of Rap
golf balls golf balls
m1 – m2 = difference between
the mean distances
Simple random sample Simple random sample
of n1 Par golf balls of n2 Rap golf balls
x1 = sample mean distance x2 = sample mean distance
for sample of Par golf ball for sample of Rap golf ball
x1 - x2 = Point Estimate of m1 – m2

248
Example: Par, Inc.
 95% Confidence Interval Estimate of the Difference

Between Two Population Means: Large-Sample Case,
1 and 2 Unknown
Substituting the sample standard deviations for the
population standard deviation:
12  22 (15) 2 ( 20) 2
x1  x2  z / 2   17  1. 96 
n1 n2 120 80
= 17 + 5.14 or 11.86 yards to 22.14 yards.

We are 95% confident that the difference between the
mean driving distances of Par, Inc. balls and Rap, Ltd.
balls lies in the interval of 11.86 to 22.14 yards.

249
Small-Sample Case (n1 < 30 and/or n2 < 30)
 Interval Estimate with  2 Known
x1  x2  z / 2  x1  x2
where:
1 1
2
 x1  x2   (  )
n1 n2

250
Small-Sample Case (n1 < 30 and/or n2 < 30)
 Interval Estimate with  2 Unknown
x1  x2  t / 2 sx1  x2
where:
2 2
2 1 1 ( n  1) s  ( n  1) s
sx1  x2  s (  ) s2  1 1 2 2
n1 n2 n1  n2  2

251
Example: Specific Motors
Specific Motors of Detroit has developed a new

automobile known as the M car. 12 M cars and 8 J cars
(from Japan) were road tested to compare miles-per-
gallon (mpg) performance. The sample statistics are:
Sample #1 Sample #2
M Cars J Cars
Sample Size n1 = 12 cars n2 = 8 cars
Mean = 29.8x1mpg = 27.3x2mpg
Standard Deviation s1 = 2.56 mpg s2 = 1.81
mpg

252
 Point Estimate of the Difference Between Two

Population Means
1 = mean miles-per-gallon for the population of
M cars
2 = mean miles-per-gallon for the population of
J cars
Point estimate of 1 - 2 = x1  x2 = 29.8 - 27.3 = 2.5
mpg.

253

Between Two Population Means: Small-Sample Case
We will make the following assumptions:
• The miles per gallon rating must be normally
distributed for both the M car and the J car.
• The variance in the miles per gallon rating must
be the same for both the M car and the J car.

254

Using the t distribution with n1 + n2 - 2 = 18 degrees
of freedom, the appropriate t value is t.025 = 2.101.
We will use a weighted average of the two sample
variances as the pooled estimator of  2.

255

2 2 2 2
( n  1) s  ( n  1) s 11( 2 . 56 )  7 (1. 81)
s2  1 1 2 2
  5. 28
n1  n2  2 12  8  2
2 1 1 1 1
x1  x2  t.025 s (  )  2. 5  2.101 5. 28(  )
n1 n2 12 8
= 2.5 + 2.2 or .3 to 4.7 miles per gallon.
We are 95% confident that the difference between the
mean mpg ratings of the two car types is from .3 to
4.7 mpg (with the M car having the higher mpg).

256
Hypothesis Tests About the Difference
between the Means of Two Populations:
Independent Samples
 Hypotheses
H0: 1 - 2 < 0 H0: 1 - 2 > 0 H0: 1 - 2 = 0
Ha: 1 - 2 > 0 Ha: 1 - 2 < 0 Ha: 1 - 2  0
 Test Statistic
Large-Sample Small-Sample
( x1  x2 )  ( 1   2 ) ( x1  x2 )  (  1   2 )
z t
12 n1   22 n2 s 2 (1 n1  1 n2 )

257
Example: Par, Inc.
 Hypothesis Tests About the Difference between the

Means of Two Populations: Large-Sample Case
Par, Inc. is a manufacturer of golf equipment and has
developed a new golf ball that has been designed to
provide “extra distance.” In a test of driving distance
using a mechanical driving device, a sample of Par
golf balls was compared with a sample of golf balls
made by Rap, Ltd., a competitor. The sample
statistics appear on the next slide.

258
Example: Par, Inc.
 Hypothesis Tests About the Difference Between the

• Sample Statistics
Sample #1 Sample #2
Par, Inc. Rap, Ltd.
Sample Size n1 = 120 balls n2 = 80 balls
Mean = 235xyards
1 = 218xyards
2
Standard Dev. s1 = 15 yards s2 = 20 yards

259
Example: Par, Inc.

Can we conclude, using a .01 level of significance,
that the mean driving distance of Par, Inc. golf balls is
greater than the mean driving distance of Rap, Ltd.
golf balls?

260
Example: Par, Inc.

1 = mean distance for the population of Par, Inc.
golf balls
2 = mean distance for the population of Rap, Ltd.
golf balls
• Hypotheses H0: 1 - 2 < 0
Ha: 1 - 2 > 0

261
Example: Par, Inc.

• Rejection Rule
Reject H0 if z > 2.33
( x1  x2 )  ( 1   2 ) ( 235  218)  017

z    6. 49
12  22 2
(15) ( 20) 2 2. 62
 
n1 n2 120 80

262
Example: Par, Inc.

• Conclusion
Reject H0. We are at least 99% confident that the

mean driving distance of Par, Inc. golf balls is
greater than the mean driving distance of Rap, Ltd.
golf balls.

263

Means of Two Populations: Small-Sample Case
that the miles-per-gallon (mpg) performance of M
cars is greater than the miles-per-gallon performance
of J cars?

264

1 = mean mpg for the population of M cars
2 = mean mpg for the population of J cars
• Hypotheses H0: 1 - 2 < 0
Ha: 1 - 2 > 0

265

• Rejection Rule
Reject H0 if t > 1.734
(a = .05, d.f. = 18)
• Test Statistic
( x1  x2 )  ( 1   2 )
t
s2 (1 n1  1 n2 )
where:
(n1  1)s12  (n2  1)s22
2
s 
n1  n2  2

266
Inference About the Difference between the
Means of Two Populations: Matched Samples
 With a matched-sample design each sampled item
provides a pair of data values.
 The matched-sample design can be referred to as
blocking.
 This design often leads to a smaller sampling error
than the independent-sample design because
variation between sampled items is eliminated as a
source of sampling error.

267
Example: Express Deliveries
 Inference About the Difference between the Means of

Two Populations: Matched Samples
A Chicago-based firm has documents that must
be quickly distributed to district offices throughout
the U.S. The firm must decide between two delivery
services, UPX (United Parcel Express) and INTEX
(International Express), to transport its documents.
In testing the delivery times of the two services, the
firm sent two reports to a random sample of ten
district offices with one report carried by UPX and
the other report carried by INTEX.
Do the data that follow indicate a difference in
mean delivery times for the two services?

268
Delivery Time (Hours)

District Office UPX INTEX Difference
Seattle 32 25 7
Los Angeles 30 24 6
Boston 19 15 4
Cleveland 16 15 1
New York 15 13 2
Houston 18 15 3
Atlanta 14 15 -1
St. Louis 10 8 2
Milwaukee 7 9 -2
Denver 16 11 5

269

Let d = the mean of the difference values for the
two delivery services for the population of
district offices
• Hypotheses
H0: d = 0, Ha: d 

270

• Rejection Rule
Assuming the population of difference values is
approximately normally distributed, the t
distribution with n - 1 degrees of freedom applies.
With  = .05, t.025 = 2.262 (9 degrees of freedom).
Reject H0 if t < -2.262 or if t > 2.262

271

 di ( 7  6... 5)
d    2. 7
n 10
2
 ( di  d ) 76.1
sd    2. 9
n 1 9
d  d 2. 7  0
t   2. 94
sd n 2. 9 10

272

• Conclusion
Reject H0.
There is a significant difference between the mean
delivery times for the two services.

273
Introduction to Analysis of Variance
 Analysis of Variance (ANOVA) can be used to test

for the equality of three or more population means
using data obtained from observational or
experimental studies.
 We want to use the sample results to test the
following hypotheses.
H0: 1=2=3=. . .= k
Ha: Not all population means are equal

274
Introduction to Analysis of Variance
 If H0 is rejected, we cannot conclude that all

population means are different.
 Rejecting H0 means that at least two population
means have different values.

275
Assumptions for Analysis of Variance
 For each population, the response variable is

normally distributed.
 The variance of the response variable, denoted  2, is
the same for all of the populations.
 The observations must be independent.

276
Analysis of Variance:
Testing for the Equality of k Population Means
 Between-Treatments Estimate of Population Variance
 Within-Treatments Estimate of Population Variance
 Comparing the Variance Estimates: The F Test
 The ANOVA Table

277
Between-Treatments Estimate
of Population Variance
 A between-treatment estimate of  2 is called the
mean square treatment and is denoted MSTR.
 j j
n (
j 1
x  x ) 2
MSTR 
k 1
 The numerator of MSTR is called the sum of squares

treatment and is denoted SSTR.
 The denominator of MSTR represents the degrees of
freedom associated with SSTR.

278
Within-Samples Estimate
of Population Variance
 The estimate of  2 based on the variation of the
sample observations within each sample is called the
mean square error and is denoted by MSE.
 j
( n
j 1
 1) s 2
j
MSE 
nT  k
 The numerator of MSE is called the sum of squares

error and is denoted by SSE.
 The denominator of MSE represents the degrees of
freedom associated with SSE.

279
Comparing the Variance Estimates: The F Test
 If the null hypothesis is true and the ANOVA

assumptions are valid, the sampling distribution of
MSTR/MSE is an F distribution with MSTR d.f. equal
to k - 1 and MSE d.f. equal to nT - k.
 If the means of the k populations are not equal, the
value of MSTR/MSE will be inflated because MSTR
overestimates  2.
 Hence, we will reject H0 if the resulting value of
MSTR/MSE appears to be too large to have been
selected at random from the appropriate F
distribution.

280
Test for the Equality of k Population Means
 Hypotheses
H0: 1=2=3=. . .= k
Ha: Not all population means are equal
 Test Statistic
F = MSTR/MSE
 Rejection Rule
Reject H0 if F > F
where the value of F is based on an F distribution
with k - 1 numerator degrees of freedom and nT - 1
denominator degrees of freedom.

281
Sampling Distribution of MSTR/MSE
 The figure below shows the rejection region

associated with a level of significance equal to 
where F denotes the critical value.
Do Not Reject H0 Reject H0

MSTR/MSE
F
Critical Value

282
ANOVA Table
Source of Sum of Degrees of Mean

Variation Squares Freedom Squares F
Treatment SSTR k-1 MSTR MSTR/MSE
Error SSE nT - k MSE
Total SST nT - 1
SST divided by its degrees of freedom nT - 1 is simply

the overall sample variance that would be obtained if
we treated the entire nT observations as one data set.
k nj
SST   ( xij  x) 2  SSTR  SSE

j 1 i 1

283
Example: Reed Manufacturing
 Analysis of Variance
J. R. Reed would like to know if the mean number
of hours worked per week is the same for the
department managers at her three manufacturing
plants (Buffalo, Pittsburgh, and Detroit).
A simple random sample of 5 managers from
each of the three plants was taken and the number of
hours worked by each manager for the previous
week is shown on the next slide.

284
Plant 1 Plant 2 Plant 3
Observation Buffalo Pittsburgh Detroit
1 48 73 51
2 54 63 63
3 57 66 61
4 54 64 54
5 62 74 56
Sample Mean 55 68 57
Sample Variance 26.0 26.5 24.5

285
• Hypotheses
H0: 1=2=3
Ha: Not all the means are equal
where:
1 = mean number of hours worked per
week by the managers at Plant 1
2 = mean number of hours worked per
3 = mean number of hours worked per

286
• Mean Square Treatment
Since the sample sizes are all equal
= + 68 + 57)/3 = 60
x = (55
SSTR = 5(55 - 60)2 + 5(68 - 60)2 + 5(57 - 60)2 = 490
MSTR = 490/(3 - 1) = 245
• Mean Square Error
SSE = 4(26.0) + 4(26.5) + 4(24.5) = 308
MSE = 308/(15 - 3) = 25.667

287
• F - Test
If H0 is true, the ratio MSTR/MSE should be near
1 since both MSTR and MSE are estimating  2. If
Ha is true, the ratio should be significantly larger
than 1 since MSTR tends to overestimate  2.

288
• Rejection Rule
Assuming  = .05, F.05 = 3.89 (2 d.f. numerator,
12 d.f. denominator). Reject H0 if F > 3.89
• Test Statistic
F = MSTR/MSE = 245/25.667 = 9.55

289
• ANOVA Table
Source of Sum of Degrees of Mean

Variation Squares Freedom Square F
Treatments 490 2 245 9.55
Error 308 12 25.667
Total 798 14

290
• Conclusion
F = 9.55 > F.05 = 3.89, so we reject H0. The mean
number of hours worked per week by department
managers is not the same at each plant.

291
Chapter 11
Comparisons Involving Proportions
and a Test of Independence
 Inference about the Difference Between the
Proportions of Two Populations
 A Hypothesis Test for Proportions of a
Multinomial Population
 Test of Independence: Contingency Tables
p = 0
p -
H o: 1
2
p = 0
p -
H a: 1
2

292
Inferences About the Difference between the
 Sampling Distribution of p1  p2
 Interval Estimation of p1 - p2
 Hypothesis Tests about p1 - p2

293
Sampling Distribution of p1  p2
 Expected Value
E ( p1  p2 )  p1  p2
p1 (1  p1 ) p2 (1  p2 )
 p1  p2  
n1 n2
where: n1 = size of sample taken from population 1

n2 = size of sample taken from population 2

294
 Distribution Form
If the sample sizes are large (n1p1, n1(1 - p1), n2p2,
and n2(1 - p2) are all greater than or equal to 5), the
sampling distribution of p1  p2 can be approximated
by a normal probability distribution.

295
p1 (1  p1 ) p2 (1  p2 )
 p1  p2  
n1 n2
p1  p2
p1 – p2

296
Interval Estimation of p1 - p2
 Interval Estimate
p1  p2  z / 2  p1  p2
 Point Estimator of  p1  p2
p1 (1  p1 ) p2 (1  p2 )
s p1  p2  
n1 n2

297
Example: MRA
MRA (Market Research Associates) is conducting

research to evaluate the effectiveness of a client’s new
advertising campaign. Before the new campaign
began, a telephone survey of 150 households in the test
market area showed 60 households “aware” of the
client’s product. The new campaign has been initiated
with TV and newspaper advertisements running for
three weeks.

298
Example: MRA
A survey conducted immediately after the new

campaign showed 120 of 250 households “aware” of
the client’s product.
Does the data support the position that the
advertising campaign has provided an increased
awareness of the client’s product?

299
Example: MRA
 Point Estimator of the Difference Between the

120 60
p1  p2  p1  p2   . 48. 40 . 08
250 150
p1 = proportion of the population of households
“aware” of the product after the new campaign
“aware” of the product before the new campaign
p1 = sample proportion of households “aware” of the
product after the new campaign
p2 = sample proportion of households “aware” of the
product before the new campaign

300
Example: MRA
 Interval Estimate of p1 - p2: Large-Sample Case
For = .05, z.025 = 1.96:

. 48(. 52 ) . 40(. 60)
. 48. 40  1. 96 
250 150
.08 + 1.96(.0510)
.08 + .10
-.02 to +.18

301
Example: MRA
 Interval Estimate of p1 - p2: Large-Sample Case

• Conclusion
At a 95% confidence level, the interval estimate
of the difference between the proportion of
households aware of the client’s product before and
after the new advertising campaign is -.02 to +.18.

302
Hypothesis Tests about p1 - p2
 Hypotheses
H 0 : p1 - p2 < 0
H a : p1 - p2 > 0
 Test statistic
( p1  p2 )  ( p1  p2 )
z
 p1  p2

303
Hypothesis Tests about p1 - p2
 Point Estimator of  p1  p2 where p1 = p2
s p1  p2  p (1  p )(1 n1  1 n2 )
where:
n1 p1  n2 p2
p
n1  n2

304
Example: MRA

that the proportion of households aware of the client’s
product increased after the new advertising campaign?

305
Example: MRA

• Hypotheses
H0: p1 - p2 < 0
H a : p1 - p2 > 0

“aware” of the product after the new campaign
“aware” of the product before the new campaign

306
Example: MRA

• Rejection Rule Reject H0 if z > 1.645
• Test Statistic
250(. 48)  150(. 40) 180
p  . 45
250  150 400
s p1  p2  . 45(. 55)( 1  1 ) . 0514

250 150
(. 48. 40)  0 . 08
z   1. 56
. 0514 . 0514

307
Example: MRA

• Conclusion
z = 1.56 < 1.645. Do not reject H0. We cannot
conclude, with at least 95 % confidence, that the
proportion of households aware of the client’s
product increased after the new advertising
campaign.

308
Hypothesis (Goodness of Fit) Test
for Proportions of a Multinomial Population
1. Set up the null and alternative hypotheses.
2. Select a random sample and record the observed
frequency, fi , for each of the k categories.
3. Assuming H0 is true, compute the expected
frequency, ei , in each category by multiplying the
category probability by the sample size.

309
Hypothesis (Goodness of Fit) Test
for Proportions of a Multinomial Population
4. Compute the value of the test statistic.
2
k ( f  e )
2   i i
i 1 ei
5. Reject H0 if  2 (where
 2  is the significance level and
there are k - 1 degrees of freedom).

310
Example: Finger Lakes Homes (A)
 Multinomial Distribution Goodness of Fit Test

Finger Lakes Homes manufactures four models of
prefabricated homes, a two-story colonial, a ranch, a
split-level, and an A-frame. To help in production
planning, management would like to determine if
previous customer purchases indicate that there is a
preference in the style selected.

311

The number of homes sold of each model for 100
sales over the past two years is shown below.
Model Colonial Ranch Split-Level A-Frame

# Sold 30 20 35 15

312

Let:
pC = population proportion that purchase a colonial
pR = population proportion that purchase a ranch
pS = population proportion that purchase a split-level
pA = population proportion that purchase an A-frame

313

• Hypotheses
H0: pC = pR = pS = pA = .25
Ha: The population proportions are not pC = .25,
pR = .25, pS = .25, and pA = .25

314

• Rejection Rule
With  = .05 and
k - 1 = 4 - 1 = 3 degrees of
freedom
Do Not Reject H0 Reject H0

2
7.815

315

• Expected Frequencies
e1 = .25(100) = 25 e2 = .25(100) = 25
e3 = .25(100) = 25 e4 = .25(100) = 25
• Test Statistic
2 2 2 2
( 30  25) ( 20  25) ( 35  25) (15  25)
2    
25 25 25 25
=1+1+4+4
= 10

316

• Conclusion
c2 = 10 > 7.815. We reject the assumption there
is no home style preference, at the .05 level of
significance.

317
Test of Independence: Contingency Tables
1. Set up the null and alternative hypotheses.

2. Select a random sample and record the observed
frequency, fij , for each cell of the contingency table.
3. Compute the expected frequency, eij , for each cell.
(Row i Total)(Column j Total)

eij 
Sample Size

318
Test of Independence: Contingency Tables
4. Compute the test statistic.
( f ij  eij ) 2
2   
i j eij
5. Reject H0 if  2   2 (where  is the significance level

and with n rows and m columns there are
(n - 1)(m - 1) degrees of freedom).

319
Example: Finger Lakes Homes (B)
 Contingency Table (Independence) Test

Each home sold can be classified according to
price and to style. Finger Lakes Homes’ manager
would like to determine if the price of the home and
the style of the home are independent variables.

320

The number of homes sold for each model and
price for the past two years is shown below. For
convenience, the price of the home is listed as either
$99,000 or less or more than $99,000.
Price Colonial Ranch Split-Level A-Frame

< $99,000 18 6 19 12
> $99,000 12 14 16 3

321

• Hypotheses
H0: Price of the home is independent of the style
of the home that is purchased
Ha: Price of the home is not independent of the
style of the home that is purchased

322

• Expected Frequencies
Price Colonial Ranch Split-Level A-Frame Total

< $99K 18 6 19 12 55
> $99K 12 14 16 3 45
Total 30 20 35 15 100

323

• Rejection Rule
With  = .05 and (2 - 1)(4 - 1) = 3 d.f., .205  7. 81
Reject H0 if 2 > 7.81

324

• Test Statistic
2 2 2
(18  16 . 5) ( 6  11) ( 3  6 . 75)
2    ... 
16. 5 11 6. 75
= .1364 + 2.2727 + . . . + 2.0833 = 9.1486

325

• Conclusion
2 = 9.15 > 7.81, so we reject H0, the assumption
that the price of the home is independent of the
style of the home that is purchased.

326
Chapter 12
Simple Linear Regression
 Simple Linear Regression Model
 Least Squares Method
 Coefficient of Determination
 Model Assumptions
 Testing for Significance
 Using the Estimated Regression Equation
for Estimation and Prediction
 Computer Solution
 Residual Analysis: Validating Model Assumptions

327
Simple Linear Regression Model
 The equation that describes how y is related to x and

an error term is called the regression model.
 The simple linear regression model is:
y = b0 + b1x +e
• b0 and b1 are called parameters of the model.

• e is a random variable called the error term.

328
Simple Linear Regression Equation
 The simple linear regression equation is:
E(y) = 0 + 1x
• Graph of the regression equation is a straight line.

• b0 is the y intercept of the regression line.
• b1 is the slope of the regression line.
• E(y) is the expected value of y for a given x value.

329
 Positive Linear Relationship
E(y)
Regression line
Intercept Slope b1
b0
is positive

330
 Negative Linear Relationship
E(y)
Intercept Regression line

b0
Slope b1
is negative

331
 No Relationship
E(y)
Regression line
Intercept
b0
Slope b1
is 0

332
Estimated Simple Linear Regression Equation
 The estimated simple linear regression equation is:
ŷ  b0  b1 x
• The graph is called the estimated regression line.

• b0 is the y intercept of the line.
• b1 is the slope of the line.
• ŷ is the estimated value of y for a given x value.

333
Estimation Process
Regression Model Sample Data:

y = b0 + b1x +e x y
Regression Equation x1 y1
E(y) = b0 + b1x . .
Unknown Parameters . .
b0, b1 xn y n
Estimated
b0 and b1 Regression Equation
provide estimates of ŷ  b0  b1 x
b0 and b1 Sample Statistics
b0, b1

334
Least Squares Method
 Least Squares Criterion
min  (y i  y i ) 2
where:
yi = observed value of the dependent variable
for the ith observation
yî = estimated value of the dependent variable
for the ith observation

335
The Least Squares Method
 Slope for the Estimated Regression Equation
 xi y i  (  xi  y i ) / n
b1  2 2
 xi  (  xi ) / n

336
The Least Squares Method
 y-Intercept for the Estimated Regression Equation
b0  y  b1 x
where:
xi = value of independent variable for ith observation
yi = value of dependent variable for ith observation
_
x = mean value for independent variable
_
y = mean value for dependent variable
n = total number of observations

337
Example: Reed Auto Sales
 Simple Linear Regression

Reed Auto periodically has a special week-
long sale. As part of the advertising campaign Reed
runs one or more television commercials during the
weekend preceding the sale. Data from a sample of 5
previous sales are shown on the next slide.

338
 Simple Linear Regression
Number of TV Ads Number of Cars Sold

1 14
3 24
2 18
1 17
3 27

339
 Slope for the Estimated Regression Equation

b1 = 220 - (10)(100)/5 = 5
24 - (10)2/5
 y-Intercept for the Estimated Regression Equation
b0 = 20 - 5(2) = 10
 Estimated Regression Equation
y^ = 10 + 5x

340
 Scatter Diagram
30
25
20
Cars Sold
y^ = 10 + 5x
15
10
5
0
0 1 2 3 4
TV Ads

341
The Coefficient of Determination
 Relationship Among SST, SSR, SSE
SST = SSR + SSE
 ( y i  y )2   ( yî  y )2   ( y i  yî )2
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error

342
The Coefficient of Determination
 The coefficient of determination is:
r2 = SSR/SST
where:
SST = total sum of squares
SSR = sum of squares due to regression

343
 Coefficient of Determination
r2 = SSR/SST = 100/114 = .8772
The regression relationship is very strong because
88% of the variation in number of cars sold can be
explained by the linear relationship between the
number of TV ads and the number of cars sold.

344
The Correlation Coefficient
 Sample Correlation Coefficient
rxy  (sign of b1 ) Coefficien t of Determinat ion
rxy  (sign of b1 ) r 2
where:
b1 = the slope of the estimated regression
equation yˆ  b0  b1 x

345
 Sample Correlation Coefficient
rxy  (sign of b1 ) r 2
The sign of b1 in the equation yˆ  10 is5“+”.
x
rxy = + .8772
rxy = +.9366

346
Model Assumptions
 Assumptions About the Error Term 

1. The error  is a random variable with mean of
zero.
2. The variance of  , denoted by  2, is the same for
all values of the independent variable.
3. The values of  are independent.
4. The error  is a normally distributed random
variable.

347
Testing for Significance
 To test for a significant regression relationship, we

must conduct a hypothesis test to determine whether
the value of b1 is zero.
 Two tests are commonly used
• t Test
• F Test
 Both tests require an estimate of s 2, the variance of e
in the regression model.

348
 An Estimate of s 2
The mean square error (MSE) provides the estimate
of s 2, and the notation s2 is also used.
s2 = MSE = SSE/(n-2)
where:
SSE   ( yi  yˆ i ) 2   ( yi  b0  b1 xi ) 2

349
 An Estimate of s
• To estimate s we take the square root of s 2.
• The resulting s is called the standard error of the
estimate.
SSE
s  MSE 
n2

350
Testing for Significance: t Test
 Hypotheses
H0: 1 = 0
Ha: 1 = 0
 Test Statistic
b1
t
sb 1

351
 Rejection Rule
Reject H0 if t < -tor t > t
where: t is based on a t distribution

with n - 2 degrees of freedom

352
 t Test
• Hypotheses
H0 : 1 = 0
Ha: 1 = 0
• Rejection Rule
For  = .05 and d.f. = 3, t.025 = 3.182

353
 t Test
• Test Statistics
t = 5/1.08 = 4.63
• Conclusions
t = 4.63 > 3.182, so reject H0

354
Confidence Interval for 1
 We can use a 95% confidence interval for 1 to test the

hypotheses just used in the t test.
 H0 is rejected if the hypothesized value of 1 is not
included in the confidence interval for 1.

355
Confidence Interval for 1
 The form of a confidence interval for 1 is:

b1  t / 2 sb1
where b1 is the point estimate

t /the
is 2 sb1 margin of error
is tthe
 / 2 t value providing an area
of a/2 in the upper tail of a
t distribution with n - 2 degrees
of freedom

356
 Rejection Rule
Reject H0 if 0 is not included in
the confidence interval for 1.
 95% Confidence Interval for 1
b1  t / 2 sb1
= 5 +/- 3.182(1.08) = 5 +/- 3.44
or 1.56 to 8.44
 Conclusion
0 is not included in the confidence interval.
Reject H0

357
Testing for Significance: F Test
 Hypotheses
H0 : 1 = 0
Ha : 1 = 0
 Test Statistic
F = MSR/MSE

358
 Rejection Rule
where: F is based on an F distribution

with 1 d.f. in the numerator and
n - 2 d.f. in the denominator

359
 F Test
• Hypotheses
H0 : 1 = 0
Ha: 1 = 0
• Rejection Rule
For  = .05 and d.f. = 1, 3: F.05 = 10.13
Reject H0 if F > 10.13.

360
 F Test
• Test Statistic
F = MSR/MSE = 100/4.667 = 21.43
• Conclusion
F = 21.43 > 10.13, so we reject H0.

361
Some Cautions about the
Interpretation of Significance Tests
 Rejecting H0: b1 = 0 and concluding that the
relationship between x and y is significant does not
enable us to conclude that a cause-and-effect
relationship is present between x and y.
 Just because we are able to reject H0: b1 = 0 and
demonstrate statistical significance does not enable
us to conclude that there is a linear relationship
between x and y.

362
Using the Estimated Regression Equation
 Confidence Interval Estimate of E(yp)
y p  t  /2 s y p
 Prediction Interval Estimate of yp
yp + t/2 sind
where: confidence coefficient is 1 -  and

t/2 is based on a t distribution
with n - 2 degrees of freedom

363
 Point Estimation
If 3 TV ads are run prior to a sale, we expect the mean
number of cars sold to be:
y^ = 10 + 5(3) = 25 cars

364
 Confidence Interval for E(yp)

95% confidence interval estimate of the mean number
of cars sold when 3 TV ads are run is:
25 + 4.61 = 20.39 to 29.61 cars

365
 Prediction Interval for yp

95% prediction interval estimate of the number of
cars sold in one particular week when 3 TV ads are
run is:
25 + 8.28 = 16.72 to 33.28 cars

366
Residual Analysis
 Residual for Observation i

yi – ^yi
 Standardized Residual for Observation i

y i  yî
syi  yî
where: syi  yî  s 1  hi

367
 Residuals
Observation Predicted Cars Sold Residuals
1 15 -1
2 25 -1
3 20 -2
4 15 2
5 25 2

368
 Residual Plot
TV Ads Residual Plot

3
2
Residuals
1
0
-1
-2
-3
0 1 2 3 4
TV Ads

369
Residual Analysis
 Residual Plot
y  yˆ
Good Pattern
Residual

370
Residual Analysis
 Residual Plot
y  yˆ
Nonconstant Variance
Residual

371
Residual Analysis
 Residual Plot
y  yˆ
Model Form Not Adequate
Residual

372
Chapter 13
Multiple Regression
 Multiple Regression Model
 Least Squares Method
 Multiple Coefficient of Determination
 Model Assumptions
 Testing for Significance
 Using the Estimated Regression Equation
 Qualitative Independent Variables

373
Multiple Regression Model
 The equation that describes how the dependent

variable y is related to the independent variables x1,
x2, . . . xp and an error term is called the multiple
regression model.
 The multiple regression model is:
y = b0 + b1x1 + b2x2 + . . . + bpxp + e
• b0, b1, b2, . . . , bp are the parameters.

• e is a random variable called the error term.

374
Multiple Regression Equation
 The equation that describes how the mean value of y

is related to x1, x2, . . . xp is called the multiple
regression equation.
 The multiple regression equation is:
E(y) = 0 + 1x1 + 2x2 + . . . + pxp

375
Estimated Multiple Regression Equation
 A simple random sample is used to compute sample

statistics b0, b1, b2, . . . , bp that are used as the point
estimators of the parameters b0, b1, b2, . . . , bp.
 The estimated multiple regression equation is:

^
y = b0 + b1x1 + b2x2 + . . . + bpxp

376
Estimation Process
Multiple Regression Model Sample Data:

E(y) = 0 + 1x1 + 2x2 + . . + pxp + e x1 x2 . . . xp y
Multiple Regression Equation . . . .
E(y) = 0 + 1x1 + 2x2 + . . . + pxp . . . .
Unknown parameters are
b0, b1, b2, . . . , bp
Estimated Multiple
Regression Equation
b0, b1, b2, . . . , bp yˆ  b0  b1 x1  b2 x2  ...  bp x p
provide estimates of b0, b1, b2, . . . , bp
b0, b1, b2, . . . , bp are sample statistics

377
 Least Squares Criterion

min  ( y i  yî )2
 Computation of Coefficients Values
The formulas for the regression coefficients b0, b1,
b2, . . . bp involve the use of matrix algebra. We will
rely on computer software packages to perform the
calculations.

378
 A Note on Interpretation of Coefficients

bi represents an estimate of the change in y
corresponding to a one-unit change in xi when all
other independent variables are held constant.

379
Multiple Coefficient of Determination
 Relationship Among SST, SSR, SSE
SST = SSR + SSE
 i
( y  y ) 2
  i
( ˆ
y  y )2
  i i
( y  ˆ
y )2

380
Multiple Coefficient of Determination
 Multiple Coefficient of Determination
R 2 = SSR/SST
 Adjusted Multiple Coefficient of Determination
n1
Ra2  1  ( 1  R 2 )
np1

381
Model Assumptions
 Assumptions About the Error Term 

1. The error  is a random variable with mean of
zero.
2. The variance of  , denoted by 2, is the same for
all values of the independent variables.
3. The values of  are independent.
4. The error  is a normally distributed random
variable reflecting the deviation between the y
value and the expected value of y given by
0 + 1x1 + 2x2 + . . . + pxp

382
Example: Programmer Salary Survey
A software firm collected data for a sample of 20

computer programmers. A suggestion was made that
regression analysis could be used to determine if salary
was related to the years of experience and the score on
the firm’s programmer aptitude test.
The years of experience, score on the aptitude test,
and corresponding annual salary ($1000s) for a sample
of 20 programmers is shown on the next slide.

383
Exper. Score Salary Exper. Score Salary

4 78 24 9 88 38
7 100 43 2 73 26.6
1 86 23.7 10 75 36.2
5 82 34.3 5 81 31.6
8 86 35.8 6 74 29
10 84 38 8 87 34
0 75 22.2 4 79 30.1
1 80 23.1 6 94 33.9
6 83 30 3 70 28.2
6 91 33 3 89 30

384
 Multiple Regression Model

Suppose we believe that salary (y) is related to the
years of experience (x1) and the score on the
programmer aptitude test (x2) by the following
regression model:
y = 0 + 1x1 + 2x2 + 
where
y = annual salary ($000)
x1 = years of experience
x2 = score on programmer aptitude test

385
 Solving for the Estimates of 0, 1, 2
Least Squares
Input Data Output
x1 x2 y
Computer b0 =
Package b1 =
4 78 24
for Solving b2 =
7 100 43
Multiple
. . . R2 =
Regression
. . .
Problems etc.
3 89 30

386
 Minitab Computer Output
The regression is
Salary = 3.174 + 1.404 Exper + 0.251 Score
Predictor Coef Stdev t-ratio
p
Constant 3.174 6.156 .52 .613
Exper 1.4039 .1986 7.07 .000
Score .25089 .07735 3.24 .005
s = 2.419 R-sq = 83.4% R-sq(adj) =
81.5%

387

SALARY = 3.174 + 1.404(EXPER) + 0.2509(SCORE)
Note: Predicted salary will be in thousands of dollars

388
 In simple linear regression, the F and t tests provide

the same conclusion.
 In multiple regression, the F and t tests have different
purposes.

389
 The F test is used to determine whether a significant

relationship exists between the dependent variable
and the set of all the independent variables.
 The F test is referred to as the test for overall
significance.

390
 If the F test shows an overall significance, the t test is

used to determine whether each of the individual
independent variables is significant.
 A separate t test is conducted for each of the
independent variables in the model.
 We refer to each of these t tests as a test for
individual significance.

391
 Hypotheses
H 0: 1 =  2 = . . . = p = 0
Ha: One or more of the parameters
is not equal to zero.
 Test Statistic
F = MSR/MSE
 Rejection Rule
where F is based on an F distribution with p d.f. in
the numerator and n - p - 1 d.f. in the denominator.

392
 Hypotheses
H0: i = 0
Ha: i = 0
 Test Statistic
bi
t
sbi
 Rejection Rule
Reject H0 if t < -tor t > t
where t is based on a t distribution with
n - p - 1 degrees of freedom.

393
 Minitab Computer Output (continued)

Analysis of Variance
SOURCE DF SS MS F P
Regression 2 500.33 250.16 42.76 0.000
Error 17 99.46 5.85
Total 19 599.79

394
 F Test
• Hypotheses
H0 :  1 = 2 = 0
Ha: One or both of the parameters
is not equal to zero.
• Rejection Rule
For  = .05 and d.f. = 2, 17:
F.05 = 3.59
Reject H0 if F > 3.59.

395
 F Test
• Test Statistic
F = MSR/MSE
= 250.16/5.85 = 42.76
• Conclusion
F = 42.76 > 3.59, so we can reject H0.

396
 t Test for Significance of Individual Parameters

• Hypotheses
H0: i = 0
Ha: i = 0
• Rejection Rule
For  = .05 and d.f. = 17:
t.025 = 2.11

397
 t Test for Significance of Individual Parameters

• Test Statistics
b1 1. 4039 b2 . 25089
  7 . 07   3. 24
sb1 . 1986 sb2 . 07735
• Conclusions
Reject H0: 1 = 0 and reject H0: 2 = 0.
Both independent variables are significant.

398
Testing for Significance: Multicollinearity
 The term multicollinearity refers to the correlation

among the independent variables.
 When the independent variables are highly
correlated (say, |r | > .7), it is not possible to
determine the separate effect of any particular
independent variable on the dependent variable.

399
Testing for Significance: Multicollinearity
 If the estimated regression equation is to be used

only for predictive purposes, multicollinearity is
usually not a serious problem.
 Every attempt should be made to avoid including
independent variables that are highly correlated.

400
Using the Estimated Regression Equation
 The procedures for estimating the mean value of y
and predicting an individual value of y in multiple
regression are similar to those in simple regression.
 We substitute the given values of x1, x2, . . . , xp into
the estimated regression equation and use the
^
corresponding value of y as the point estimate.
 The formulas required to develop interval estimates
for the mean value of y and for an individual value
of y are beyond the scope of the text.
 Software packages for multiple regression will often
provide these interval estimates.

401
Qualitative Independent Variables
 In many situations we must work with qualitative

independent variables such as gender (male, female),
method of payment (cash, check, credit card), etc.
 For example, x2 might represent gender where x2 = 0
indicates male and x2 = 1 indicates female.
 In this case, x2 is called a dummy or indicator
variable.

402
Qualitative Independent Variables
 If a qualitative variable has k levels, k - 1 dummy

variables are required, with each dummy variable
being coded as 0 or 1.
 For example, a variable with levels A, B, and C
would be represented by x1 and x2 values of (0, 0),
(1, 0), and (0,1), respectively.

403
Example: Programmer Salary Survey (B)
As an extension of the problem involving the

computer programmer salary survey, suppose that
management also believes that the annual salary is
related to whether or not the individual has a graduate
degree in computer science or information systems.
The years of experience, the score on the programmer
aptitude test, whether or not the individual has a
relevant graduate degree, and the annual salary ($000)
for each of the sampled 20 programmers are shown on
the next slide.

404
Exp. Score Degr. Salary Exp. Score Degr. Salary

4 78 No 24 9 88 Yes 38
7 100 Yes 43 2 73 No 26.6
1 86 No 23.7 10 75 Yes 36.2
5 82 Yes 34.3 5 81 No 31.6
8 86 Yes 35.8 6 74 No 29
10 84 Yes 38 8 87 Yes 34
0 75 No 22.2 4 79 No 30.1
1 80 No 23.1 6 94 Yes 33.9
6 83 No 30 3 70 No 28.2
6 91 Yes 33 3 89 No 30

405
 Multiple Regression Equation

E(y ) = 0 + 1x1 + 2x2 + 3x3
^
y = b0 + b1x1 + b2x2 + b3x3
where
y = annual salary ($000)
x1 = years of experience
x2 = score on programmer aptitude test
x3 = 0 if individual does not have a grad. degree
1 if individual does have a grad. degree
Note: x3 is referred to as a dummy variable.

406
 Minitab Computer Output

The regression is
Salary = 7.95 + 1.15 Exp + 0.197 Score + 2.28 Deg
Predictor Coef Stdev t-ratio
p
Constant 7.945 7.381 1.08 .298
Exp 1.1476 .2976 3.86 .001
Score .19694 .0899 2.19 .044
Deg 2.280 1.987 1.15 .268
s = 2.396 R-sq = 84.7% R-sq(adj) = 81.8%

407
 Minitab Computer Output (continued)

Analysis of Variance
SOURCE DF SS MS F P
Regression 3 507.90 169.30 29.48 0.000
Error 16 91.89 5.74
Total 19 599.79

408
 Interpreting the Parameters

• b1 = 1.15
Salary is expected to increase by $1,150 for each
additional year of experience (when all other
independent variables are held constant)

409

• b2 = 0.197
Salary is expected to increase by $197 for each
additional point scored on the programmer
aptitude test (when all other independent
variables are held constant)

410

• b3 = 2.28
Salary is expected to be $2,280 higher for an
individual with a graduate degree than one
without a graduate degree (when all other
independent variables are held constant)

411

Statistical Analysis BSA

Uploaded by

Copyright:

Available Formats

You might also like

Statistical Analysis BSA

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Analysis BSA

Uploaded by

Copyright:

Available Formats

Chapter 1

Data and Statistics

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

 Elements, Variables, and Observations

© 2003 Thomson/South-Western Slide

 Data are the facts and figures that are collected,

© 2003 Thomson/South-Western Slide

 The elements are the entities on which data are

© 2003 Thomson/South-Western Slide

Elements Data Set Datum

© 2003 Thomson/South-Western Slide

 Scales of measurement include:

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

 Data can be further classified as being qualitative or

© 2003 Thomson/South-Western Slide

 Qualitative data are labels or names used to identify

© 2003 Thomson/South-Western Slide

 Quantitative data indicate either how many or how

© 2003 Thomson/South-Western Slide

 Cross-sectional data are collected at the same or

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

 Descriptive statistics are the tabular, graphical, and

© 2003 Thomson/South-Western Slide

The manager of Hudson Auto would like to have

© 2003 Thomson/South-Western Slide

 Tabular Summary (Frequencies and Percent

© 2003 Thomson/South-Western Slide

 Graphical Summary (Histogram)

© 2003 Thomson/South-Western Slide

 Numerical Descriptive Statistics

© 2003 Thomson/South-Western Slide

 Statistical inference is the process of using data

© 2003 Thomson/South-Western Slide

 Process of Statistical Inference

4. The value of the 3. The sample data

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

 A frequency distribution is a tabular summary of

© 2003 Thomson/South-Western Slide

© 2003 Thomson/South-Western Slide

 The percent frequency of a class is the relative

© 2003 Thomson/South-Western Slide