BBADM 221 Unit 10 - With Notes

1
BADM 221 Statistics for Business
Week 10
ANOVA (Analysis of Variance)
2
ANOVA
Test of several means – n-sample hypothesis test
Many statistical applications in psychology, social science,

business administration, and the natural sciences involve several
groups.
Example:
• An experiment to study the effects of five different brands of
gasoline on car engine efficiency.
• A consumer looking for a new car might compare the average
gas mileage of seven car models.
• A professor wishes to study the effect of four different teaching
techniques on mathematics proficiency.
3
ANOVA
The characteristic that differentiates the treatments from one
another is called the factor of the study. The different
treatments is called the levels of the factor. Here, we only
consider one factor.
Example:
• An experiment to study the effects of five different brands of gasoline
on car engine efficiency.
Factor: Gasoline brand Treatments: The 5 different
brands
• A consumer looking for a new car might compare the average gas
mileage of seven car models.
Factor: Car model Treatments: The 7
car models
• A professor wishes to study the effect of four different teaching
techniques on mathematics proficiency.
Factor: Teaching technique Treatments: The 4 different
techniques
4
ANOVA
For hypothesis tests comparing averages among more

than two groups, statisticians have developed a method
called “Analysis of Variance” (abbreviated ANOVA)
One-way ANOVA
(Single-factor ANOVA)
The purpose of an ANOVA test is to determine whether

there is any significant difference among several group
means. The test uses variances to help determine if the
means are equal or not.
5
ANOVA
Two kinds of variances (source of variations)
• Variance between treatments:

Variation due to the different levels of the factor
(Termed as Sum of Squares of treatment/factor)
SS(Treatment) or SS(Factor)
• Variance within treatments:
Variation due to error
(Termed as Sum of Squares of Error)
SS(Error)
6
ANOVA
Null and Alternative Hypothesis
H0: All the population means are the same.

Ha: At least one of the means is different.
Suppose we want to compare k groups.

H0: The population means of all k groups are the same.
Ha: At least one group has a different mean.
H0: 1  2    k
Ha: At least one i is different from others
7
ANOVA
Data are typically put into a table for easy referencing by
computer software. The table is called ANOVA table.
Number of treatments: k Total number of data: n

Source of Sum of Squares Degrees of
Mean Square (MS) F
Variation (SS) Freedom (df)
MS(Factor)
Between SS(Factor) or MS(Factor)
k–1 SS(Factor) F
Treatments SS(Treatment)  MS(Error)
k 1
MS(Error)
Error (Within
SS(Error) n–k SS(Error)
Treatments)

nk
Total SS(Total) n–1

8
ANOVA
Example:
Three different diet plans are to be tested for mean weight loss. The
entries in the table are the weight losses for the different plans.
Plan 1 Plan 2 Plan 3
5 3.5 8
4.5 7 4
4 4.5 3.5
3
The resulting ANOVA table is shown below:

Source of Variation Sum of Squares Degrees of Freedom Mean Square F
Between Treatments 2.2458

Error (Within
Treatments)
20.8542
Total
9
ANOVA
Number of treatments: k = Total number of data: n =
Between Treatments 2.2458

Error (Within
Treatments)
20.8542
Total
Source of Sum of Squares Degrees of

Mean Square (MS) F
Variation (SS) Freedom (df)
Between SS(Factor) or SS(Factor) MS(Factor)
k–1
Treatments SS(Treatment) k 1 MS(Error)
Error (Within SS(Error)

SS(Error) n–k
Treatments) nk
Total SS(Total) n–1

10
ANOVA
Example (continued):
Three different diet plans are to be tested for mean weight loss. The
entries in the table are the weight losses for the different plans.
Plan 1 Plan 2 Plan 3
5 3.5 8
4.5 7 4
4 4.5 3.5
3
Test the hypothesis that the mean weight loss of the 3 diet plans are
the same, at 5% level of significance.
11
ANOVA
Hypothesis Testing:
H 0: The population mean weight loss of the three diet

plans are ALL the same.
Ha: At least one of the diet plans has a different mean
weight loss.
ANOVA table
Between Treatments 2.2458 2 1.1229 0.3769
Error (Within Treatments) 20.8542 7 2.9792
Total 23.1 9
12
ANOVA
Hypothesis Testing:
Between Treatments 2.2458 2 1.1229 0.3769
Error (Within Treatments) 20.8542 7 2.9792
Total 23.1 9
F50,50
F-distribution
F10,90
F3,5
F90,10
13
ANOVA
Hypothesis Testing:
Between Treatments 2.2458 2 (df1) 1.1229 0.3769
Error (Within Treatments) 20.8542 7 (df2) 2.9792
Total 23.1 9
Critical value
Fdf1 ,df2  F2,7  4.7375
Test Statistic
Fc  0.3769
14
ANOVA
Hypothesis Testing:
Reject H0 if (Test Statistic > Critical value)
Do not reject H0 if (Test Statistic Critical

value)

Between Treatments 2.2458 2 (df1) 1.1229 0.3769
Error (Within Treatments) 20.8542 7 (df2) 2.9792
Total 23.1 9
Critical value Test Statistic
Fdf1 ,df2  F2,7  4.7375
Fc  0.3769
Fc  F2,7  Do not reject H0 .

 There is insufficient evidence that at least one of the
diet plans has a different mean weight loss.
15
ANOVA
1 H 0: The population mean weight loss of the three diet

plans are ALL the same.
Ha: At least one of the diet plans has a different mean
weight loss.
2 Test statistic: Fc  0.3769
3 Critical Value: At 5% level of significance, F2,7  4.7375
4 Fc  F2,7  Do not reject H0
5 Conclusion: Do not reject H0 at a 5% level of significance.

 There is insufficient evidence that at least one of the diet
plans has a different mean weight loss.
16
ANOVA
Example:
As part of an experiment to see how different types of soil cover
would affect slicing tomato production, Douglas College students
grew tomato plants under different soil cover conditions. Groups of
three plants each had one of the 5 treatments (i.e. a total of 15 plants).
All plants grew under the same conditions and were the same variety.
Students recorded the weight (in grams) of tomatoes produced by
each of the plants and the results are summarized in an ANOVA table:
Between Treatments 36,648,561
Error (Within Treatments)
Total 57,095,287
At the 0.05 level of significance, conduct a hypothesis test to

determine if all treatment means are the same.
17
ANOVA
Between Treatments 36,648,561 4 9,162,140.25 4.481
Error (Within Treatments) 20,446,726 10 2,044,672.6
Total 57,095,287 14
H0: The population mean of all 5 treatments are the

1
same.
Ha: At least one treatment has a different mean.
3 Critical Value: At 5% level of significance, F4,10  3.478
4 Fc  F4,10  Reject H0
5 Conclusion: Reject H0 at a 5% level of significance.
 There is sufficient evidence that at least one treatment
has a different mean.
18
ANOVA
Example:
In a completely randomized experimental design, 7 experimental
units were used for each of the 4 levels of the factor:
Between Treatments
Error (Within Treatments) 24,000

Total 38,301
Complete the ANOVA table and test the hypothesis that the
population treatment means are all the same, at   0.05 .
19
ANOVA
Between Treatments 14,301 3 4,767 4.767
Error (Within Treatments) 24,000 24 1,000
Total 38,301 27
1 H 0: The population mean of all 4 factors are the same.

Ha: At least one factor has a different mean.
3 Critical Value: At   0.05 , F3,24  F3,20  3.0983
4 Fc  F3,20  Reject H0
5 Conclusion: Reject H0 at a 5% level of significance.
 There is sufficient evidence that at least one factor has a
different mean.
20
BADM 221 Statistics for Business
Unit 11
Linear Regression
21
Linear Regression
Regression is a statistical technique that uses the idea

that one variable may be related to one or more variables
through an equation.
Here we consider the relationship of two variables only in a

straight line relationship, which is called simple linear
regression.
22
Linear Regression
Simple linear regression uses the relationship between the

two variables to obtain information about one variable by
knowing the values of the other.
The equation showing this type of relationship is called

linear regression equation.
23
Linear Regression
Linear equation: y  mx  b
slope y-intercept
y  2x 1 Slope = 2
Y-intercept = –1
y-intercept
24
Linear Regression
We want to use X to predict (or estimate) the value of Y that

might be obtained without actually measuring it, provided
the relationship between the two can be expressed by a line.
“ X ” is usually called the independent variable and “ Y ” is

called the dependent variable.
Statistics
Score
Mathematics Score
25
Linear Regression
Example: The exam scores of a class of 9 students in
Mathematics ( X ) and in Statistics ( Y ) are shown
below:
Math Score (X) 80 58 92 60 75 63 93 76 78
Stat Score (Y) 78 64 96 62 78 65 90 61 82
Statistics
Score
Mathematics Score
26
Linear Regression
We want to determine the equation of the regression line
that best-fits the data.
Statistics Statistics
Score Score
Mathematics Score Mathematics Score
Statistics Statistics
Score Score
Mathematics Score Mathematics Score

27
Linear Regression
Equation of the regression line:
df SS
Regression 1 1004.483
Residual 7 301.517
Total 8 1306
Coefficients Standard Error t Stat p-value

Intercept 9.450 13.74 0.687 0.513
Math Score 0.872 0.1807 4.829 0.001
Statistics
Score
Mathematics Score
28
Linear Regression
Equation of the regression line:
df SS
Residual 7 301.517
Total 8 1306

Intercept 9.450 13.74 0.687 0.513
Math Score 0.872 0.1807 4.829 0.001
Y  9.450  0.872 X
Statistics
Score
Stat Score  9.450  0.872  Math Score
Mathematics Score
29
Linear Regression
We can then make prediction using the regression

equation:
Stat Score  9.450  0.872  Math Score
For example:
Score in Math  Estimated score in Stat
61  9.450 +
0.872 61 = 62.42 
73  9.450 +
0.872 73 = 73.11
91 9.450 +
30
Linear Regression
Is the regression relationship significant?
Null and Alternative Hypothesis

H0: There is no relationship between X and Y
(The regression relationship is NOT
significant.)
Ha: There is a linear relationship between X and Y
(The regression relationship is
significant.)
31
Linear Regression
Use the p-value approach

Reject H0 if (p-value level of significance)

 The regression relationship is significant.
Do not reject H0 if (p-value > level of significance)

 The regression relationship is NOT
significant.
32
Linear Regression
df SS
Residual 7 301.517
Total 8 1306

Intercept 9.450 13.74 0.687 0.513
Math Score 0.872 0.1807 4.829 0.001
Which p-value?
33
Linear Regression
df SS
Residual 7 301.517
Total 8 1306

Intercept 9.450 13.74 0.687 0.513
Math Score 0.872 0.1807 4.829 0.001
As an illustration: Take level of significance = 5%
The p-value for Math Score is 0.001 < the level of significance
 Reject H0  The regression relationship is significant.
34
Linear Regression
How good is the regression equation?
Coefficient of Determination, R2
SS (Regression)
R 
2
(decimal  percent)
SS (Total)
Interpreted as the percentage of the observed variation in Y

that can be explained by the variation in X.
35
Linear Regression
df SS
Residual 7 301.517
Total 8 1306

Intercept 9.450 13.74 0.687 0.513
Math Score 0.872 0.1807 4.829 0.001
1004.483
R 
2
 0.7691  76.91%
1306
76.91% of the variability of the Statistics score can be explained

by the linear relationship with the Mathematics score.
36
Linear Regression
Example:
A teacher wishes to investigate if there is any relationship
between a student’s exam score in Mathematics (X) and the
exam score in Accounting (Y). A sample of 11 students is
randomly selected and the results are summarized in the
ANOVA table below:
df SS
Residual 9 81.96
Total 10 1387.64

Intercept 24.13 4.657 5.182 0.005
MathScore 0.759 0.063 11.974 0.001
37
Linear Regression
df SS
Residual 9 81.96
Total 10 1387.64

Intercept 24.13 4.657 5.182 0.005
MathScore 0.759 0.063 11.974 0.001
What is the estimated regression equation that relates the exam score in
accounting (Y) to the score in mathematics (X)?
What is the estimated exam score in accounting if a student got a score of

80 in mathematics?
38
Linear Regression
df SS
Residual 9 81.96
Total 10 1387.64

Intercept 24.13 4.657 5.182 0.005
MathScore 0.759 0.063 11.974 0.001
What is the estimated regression equation that relates the exam score in
accounting (Y) to the score in mathematics (X)?
Y  24.13  0.759 X
Acc.Score  24.13  0.759  (MathScore)
What is the estimated exam score in accounting if a student got a score of
80 in mathematics?
24.13  0.759  80  84.85

39
Linear Regression
df SS
Residual 9 81.96
Total 10 1387.64

Intercept 24.13 4.657 5.182 0.005
MathScore 0.759 0.063 11.974 0.001
Is the regression relationship significant? Use the p-value approach and 2%

level of significance.
40
Linear Regression
df SS
Residual 9 81.96
Total 10 1387.64

Intercept 24.13 4.657 5.182 0.005
MathScore 0.759 0.063 11.974 0.001

The p-value for MathScore is 0.001 < the level of significance

41
Linear Regression
df SS
Residual 9 81.96
Total 10 1387.64

Intercept 24.13 4.657 5.182 0.005
MathScore 0.759 0.063 11.974 0.001
Compute the coefficient of determination between the exam score in

accounting and the exam score in mathematics. Interpret the result in the
context of the problem.
42
Linear Regression
df SS
Residual 9 81.96
Total 10 1387.64

Intercept 24.13 4.657 5.182 0.005
MathScore 0.759 0.063 11.974 0.001
Compute the coefficient of determination between the exam score in

accounting and the exam score in mathematics. Interpret the result in the
context of the problem.
1305.68
R 
2
 0.9409  94.09%
1387.64
94.09% of the variability of the exam score in
accounting can be explained by the linear
relationship with the exam score in mathematics.
43
Linear Regression
Coefficient of determination
1305.68
R2 
1387.64
df SS  0.9409  94.09%
Residual 9 81.96
Total 10 1387.64

Intercept 24.13 4.657 5.182 0.005
MathScore 0.759 0.063 11.974 0.001
Acc.Score  24.13  0.759  (MathScore) Significance of regression relationship?

Estimated regression equation p-value  the level of significance
 The regression relationship is significant.
p-value > the level of significance
 The regression relationship is NOT significant.
44
Linear Regression
Example:
The accountant at Walmart wants to determine the
relationship between customer purchases at the store, Y ($),
and the customer monthly salary, X ($). A sample of 15
customers is randomly selected and the results are
summarized in the ANOVA table below:
df SS
Regression 1 186952
Residual 13 99236
Total 14 286188

Intercept 78.58 7.540 1.202 0.035
Salary 0.066 0.013 4.948 0.003
45
Linear Regression
df SS
Regression 1 186952
Residual 13 99236
Total 14 286188

Intercept 78.58 7.540 1.202 0.035
Salary 0.066 0.013 4.948 0.003
What is the estimated regression equation that relates the amount of

customer’s purchase (Y) to the customer’s monthly salary (X)?
46
Linear Regression
df SS
Regression 1 186952
Residual 13 99236
Total 14 286188

Intercept 78.58 7.540 1.202 0.035
Salary 0.066 0.013 4.948 0.003
What is the estimated regression equation that relates the amount of

customer’s purchase (Y) to the customer’s monthly salary (X)?
Y  78.56  0.066 X
Amt.Purchase  78.58  0.066  (Salary)
47
Linear Regression
df SS
Regression 1 186952
Residual 13 99236
Total 14 286188

Intercept 78.58 7.540 1.202 0.035
Salary 0.066 0.013 4.948 0.003

48
Linear Regression
df SS
Regression 1 186952
Residual 13 99236
Total 14 286188

Intercept 78.58 7.540 1.202 0.035
Salary 0.066 0.013 4.948 0.003

The p-value for Salary is 0.003 < the level of significance

49
Linear Regression
df SS
Regression 1 186952
Residual 13 99236
Total 14 286188

Intercept 78.58 7.540 1.202 0.035
Salary 0.066 0.013 4.948 0.003
Compute the coefficient of determination between the amount purchase and

the customer’s monthly salary. Interpret the result in the context of the
problem.
50
Linear Regression
df SS
Regression 1 186952
Residual 13 99236
Total 14 286188

Intercept 78.58 7.540 1.202 0.035
Salary 0.066 0.013 4.948 0.003
Compute the coefficient of determination between the amount purchase and

the customer’s monthly salary. Interpret the result in the context of the
problem.
186952
R 
2
 0.6532  65.32%
286188
65.32% of the variability of the amount
purchased can be explained by the linear
relationship with the customer’s monthly salary.
51

BBADM 221 Unit 10 - With Notes

Uploaded by

Copyright:

Available Formats

You might also like

BBADM 221 Unit 10 - With Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BBADM 221 Unit 10 - With Notes

Uploaded by

Copyright:

Available Formats

1

BADM 221 Statistics for Business

Test of several means – n-sample hypothesis test

Many statistical applications in psychology, social science,

For hypothesis tests comparing averages among more

The purpose of an ANOVA test is to determine whether

Two kinds of variances (source of variations)

• Variance between treatments:

Null and Alternative Hypothesis

H0: All the population means are the same.

Suppose we want to compare k groups.

Number of treatments: k Total number of data: n

Total SS(Total) n–1

The resulting ANOVA table is shown below:

Between Treatments 2.2458

Between Treatments 2.2458

Source of Sum of Squares Degrees of

Error (Within SS(Error)

Total SS(Total) n–1

H 0: The population mean weight loss of the three diet

Source of Variation Sum of Squares Degrees of Freedom Mean Square F

Fc  F2,7  Do not reject H0 .

1 H 0: The population mean weight loss of the three diet

3 Critical Value: At 5% level of significance, F2,7  4.7375

4 Fc  F2,7  Do not reject H0

5 Conclusion: Do not reject H0 at a 5% level of significance.

At the 0.05 level of significance, conduct a hypothesis test to

H0: The population mean of all 5 treatments are the

Source of Variation Sum of Squares Degrees of Freedom Mean Square F

Error (Within Treatments) 24,000

1 H 0: The population mean of all 4 factors are the same.

BADM 221 Statistics for Business

Regression is a statistical technique that uses the idea

Here we consider the relationship of two variables only in a

Simple linear regression uses the relationship between the

The equation showing this type of relationship is called

We want to use X to predict (or estimate) the value of Y that

“ X ” is usually called the independent variable and “ Y ” is

Mathematics Score Mathematics Score

Mathematics Score Mathematics Score

Coefficients Standard Error t Stat p-value

Coefficients Standard Error t Stat p-value

Stat Score  9.450  0.872  Math Score

We can then make prediction using the regression

Stat Score  9.450  0.872  Math Score

Is the regression relationship significant?

Null and Alternative Hypothesis

Is the regression relationship significant?

Use the p-value approach

Do not reject H0 if (p-value > level of significance)

Is the regression relationship significant?

Coefficients Standard Error t Stat p-value

Is the regression relationship significant?

Coefficients Standard Error t Stat p-value

As an illustration: Take level of significance = 5%

How good is the regression equation?

Interpreted as the percentage of the observed variation in Y

Coefficients Standard Error t Stat p-value