Download as pdf or txt
Download as pdf or txt
You are on page 1of 56

JOMO KENYATTA UNIVERSITY

OF
AGRICULTURE & TECHNOLOGY

SCHOOL OF OPEN, DISTANCE AND eLEARNING


©JKUAT-SODeL

P.O. Box 62000, 00200


Nairobi, Kenya
E-mail: elearning@jkuat.ac.ke

HCBA 3102 Statistics for Business Sciences

JJ II LAST REVISION ON June 24, 2013


J I
J DocDoc I
Back Close
HCBA 3102 STATISTICS FOR BUSINESS
This presentation is intended to covered within one week.
The notes, examples and exercises should be supple-
mented with a good textbook. Most of the exercises have
solutions/answers appearing elsewhere and accessible by
clicking the green Exercise tag. To move back to the same
page click the same tag appearing at the end of the solu-
©JKUAT-SODeL

tion/answer.

Errors and omissions in these notes are entirely the re-


sponsibility of the author who should only be contacted
through elearning@jkuat.ac.ke. In such a case, kindly en-
sure that you specify the module, the lesson number
JJ II and the page before stating the error.
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 2
Contents
©JKUAT-SODeL

6 Hypothesis Testing 2 5
6.1 Analysis of Variance (ANOVA) . . . . . . . . . . 6
• Components of total Variability . 8
6.1.1 Techniques of One-way ANOVA . . . . . . 10
6.2 Chi-Square Test . . . . . . . . . . . . . . . . . . . 32
JJ II 6.2.1 Contingency Table Analysis . . . . . . . . 43
J I 6.3 Limitations of Hypothesis Testing . . . . . . . . . 51
J DocDoc I
Back Close 3
HCBA 3102 STATISTICS FOR BUSINESS
©JKUAT-SODeL Solutions to Exercises . . . . . . . . . . . . . . . . 54

JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 4
HCBA 3102 STATISTICS FOR BUSINESS
LESSON 6
Hypothesis Testing 2

Learning outcomes
Upon completion of this lessonyou should be able to;
©JKUAT-SODeL

1. Explain the data cosiderations for one way ANOVA


2. Perform basic computations involving ANOVA
3. Carry out goodness of fit test using chisquare
4. Carry out contingency table analysis chi-square
5. Describe some limitations of hypothesis testing

Introduction
JJ II
J I This lessoncombines two topics which may appear totally unre-
J DocDoc I lated. While the first topic (ANOVA) deals with ratio/interval
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 5
HCBA 3102 STATISTICS FOR BUSINESS
versus categorical variables, the second part (CHI-SQUARE) as-
sumes categorical variables. You will released that both topics
are still on hypothesis testing.

6.1. Analysis of Variance (ANOVA)


We have studied the test of significance difference in means be-
©JKUAT-SODeL

tween two independent populations. For this we used the stan-


dard error of mean or the standard error of difference of the two
means, using z-test or t-test. This concept can be extended to
the differences in means of more than two independent popula-
tions but in a slightly different manner.
Suppose we want to study the effects of four types of fertil-
izers, say A, B, C and D on the yield of sugar cane. We take
JJ II
five plots for each fertilizer. In this way, the use of 4 fertilizers
J I
is done on 20 plots. We can find the arithmetic means of the
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 6
HCBA 3102 STATISTICS FOR BUSINESS
yields of 5 plots for each fertilizer separately. But the test of
significance of the difference of these means is not possible with
t-test. However, one way using t-test is that we make 6 pairs
of two fertilizers AB, AC, AD, BC, BD and CD and then test
their difference. Conclusion can also be drawn separately. There
arise two difficulties:
©JKUAT-SODeL

1. First, the work of computation will increase and


2. Second, Only the pairs are tested out of the four fertilizers.
We can not find whether the difference is significant taking
them together.
In such situation a method of test of significance to avoid these
two difficulties is needed and the desired objective test of signif-
JJ II icance between the means of more than two samples is fulfilled.
J I Here test of significance means, to test the hypothesis whether
J DocDoc I the means of several samples have significant difference or not.
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 7
HCBA 3102 STATISTICS FOR BUSINESS
To testing the difference among several sample means we use a
statistical technique known as Analysis of Variance. The main
objective of the analysis of variance is to test the hypothesis
whether the means of several groups have significant difference
or not.
©JKUAT-SODeL

• Components of total Variability


When observations are classified into groups or samples on the
basis of single criterion, then it is called One-way classifica-
tion. For examples, The yield of sugar cane of 20 plots, classified
in pots on the basis of four types of fertilizer, the marks obtained
by students of different colleges, etc. In general and for one way
classification, total variability is partitioned into two parts that
JJ II is :
J I Total Variation = Variation between groups
J DocDoc I + Variation within groups.
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 8
HCBA 3102 STATISTICS FOR BUSINESS
Assumptions of Analysis of Variance
The analysis of variance is based on certain assumptions as given
below :
1. Normality of the Distribution : The population for each
sample must be normally distributed with mean µ and
©JKUAT-SODeL

unknown variance σ 2 .
2. Independence of Samples : All the sample observations
must be selected randomly. The total variation of the var-
ious sources of variation should be additive.
3. Additivity : The total variation of the various sources of
variation should be additive.
4. Equal variances (but unknown) : The populations from
JJ II
which the k samples say are drawn have means µ1 , µ2 , ...µk
J I
and unknown variance σ12 = σ22 = ...... = σk2 = σ 2 .
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 9
HCBA 3102 STATISTICS FOR BUSINESS
5. The error components are independent and have mean 0
and variance σ 2 .
The tests of significance performed in the analysis of variance
are meaningful under its assumptions.

6.1.1. Techniques of One-way ANOVA


©JKUAT-SODeL

1. In One-way analysis of variance there are k groups, one


from each of k normal populations with common variance
σ 2 and means µ1 , µ2 , ...µk . The number of observations ni
in groups may be equal or unequal i.e.

n1 + n2 + ....... + nk = n

JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 10
HCBA 3102 STATISTICS FOR BUSINESS
2. Linear Model

xij = µ + αi + eij
Where xij = observations i = 1, 2, ....k, j = ni
µ = The general mean
©JKUAT-SODeL

αi = Effect of ith factor = µi − µ


eij = Effect of error or random term.

3. Null Hypothesis (H0 ) and Alternative Hypothesis (H1 ):


H0 : The means of the populations are equal i.e.

µ1 = µ2 = ...., = µk
JJ II
J I H1 : H0 A least two of the means are not equal.
J DocDoc I 4. Computations:
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 11
HCBA 3102 STATISTICS FOR BUSINESS
(i) Calculate sum of observations in each sample and of all
observations.
Sum of sample observations : x1 , x2 , ...... xk .
P P P

Sum of the squares of the group observations :


P 2 P 2
x1 , x2 , ...... x2k
P
©JKUAT-SODeL

T2
(ii) Calculate correction factor CF = n
Where T = Square of the sum of all the observations =
P
x
n= Total number of observations
(iii) Calculate group means x¯1 , x¯2 , ....., x¯k and their com-
mon mean x̄¯. where x¯k = Σx
nk
k
, x̄¯ = Σx n
(iv) Calculate total sum of squares by the formula
JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 12
HCBA 3102 STATISTICS FOR BUSINESS

T SS = Σ(x − x̄¯)2
T2
= (Σx21 + Σx22 + ..... + Σkn2 ) −
n
(Σx)2
= Σx2 −
©JKUAT-SODeL

(v) Sum of squares between samples by the formula

SSB = n1 (x̄1 − x̄¯) + n2 (x¯2 − x̄¯)2 + ......nk (x̄k − x̄¯)

(Σx1 )2 Σxk )2 T2
 
+ ... + −
n1 nk n
JJ II
J I (vi) Calculate sum of squares within samples by the for-
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 13
HCBA 3102 STATISTICS FOR BUSINESS
mula

SSW = (x1 − x¯1 )2 + (x2 − x¯2 )2 + ..... + (xk − x¯k )2


= Σ(xk − x¯k )2

Sum of squares may also be computed as


©JKUAT-SODeL

SSW = T SS − SSB

Sum of squares within samples is also called Error sum of


squares.
(vii) Calculate mean sum of squares:

JJ II
J I M SSB = Mean sum of squares between samples
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 14
HCBA 3102 STATISTICS FOR BUSINESS

SSB Sum of squares between samples


= =
k−1 Degrees of freedom

M SSW = Mean sum of squares within samples


©JKUAT-SODeL

SSW Sum of squares within samples


= =
n−k Degrees of freedom

Total number of degrees of freedom = n − 1

JJ II where n − 1 = (k − 1) + (n − k)
J I
(viii) Obtain the variance ratio F :
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 15
HCBA 3102 STATISTICS FOR BUSINESS
Variance ratio,

M SSB Variance between samples


F = =
M SSW Variance within samples

Remark 1. In general, M SSB is greater than M SSW , so M SSB


is taken in the numerator.
©JKUAT-SODeL

(ix) Interpretation of F − Ratio :


Compare the calculated value of F with tabulated value.
Let

Fe = Calculated value of F
Ft = F05 (v1 , v2 ) = Tabulated value of F

JJ II Here
J I v1 = degree of freedom for numerator
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 16
HCBA 3102 STATISTICS FOR BUSINESS
v2 = degree of freedom for denominator

If Fc > Ft , i.e. calculated value of F exceeds the tabulated


value of F , we say the difference among sample means is signif-
icant and conclude that all the population means are not equal
i.e., reject the null hypothesis.
©JKUAT-SODeL

If Fc < Ft , the difference among sample means sample means


is not significant i.e., accept the null hypothesis.
(x) The results of calculations can be presented in tabular
form. This table is called analysis of variance table.

JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 17
HCBA 3102 STATISTICS FOR BUSINESS
Analysis of Variance Table

Source of Sum of Squares Degree of Mean sum of Fc


Variation SS Freedom squares MSS
SSB M SS
Between Σnk (x¯k − k−1 k−1 = M SSB M SS
x̄)2 (SSB)
©JKUAT-SODeL

Samples
¯)2 SSW
Within Σ(xk − x̄ n−k n−k = M SSW
samples (SSW )
Total ¯)2
Σ(x − x̄ n−1

Example . Consider the following data, relating to yields of


4 varieties of wheat in 3 blocks:

JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 18
HCBA 3102 STATISTICS FOR BUSINESS

Table 6.1: Varieties data


Varieties Blocks
1 2 3
I 10 9 8
II 7 8 6
III 8 5 5
IV 5 8 5
©JKUAT-SODeL

Test whether the varieties are significantly different with re-


gard to yield Ignoring variation between blocks.

Solution:
Computation of Arithmetic Mean
JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 19
HCBA 3102 STATISTICS FOR BUSINESS
Blocks Varieties
I II III IV
1 10 7 8 5
2 9 8 5 8
3 8 6 5 5
Total 27 21 18 18
©JKUAT-SODeL

Mean(x̄) 9 7 6 6
Mean of all means

9+7+6+6 28
(x̄¯) = = =7
4 4

Variance between Samples

JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 20
HCBA 3102 STATISTICS FOR BUSINESS
Sum of the squares of the deviations

= n1 (x¯1 − x̄¯)2 + n2 (x¯2 − x̄¯)2 + n3 (x¯3 − x̄¯)2 + n4 (x¯4 − x̄¯)2


= 3(9 − 7)2 + 3(7 − 7)2 + 3(6 − 7)2 + 3(6 − 7)2
=3×4+3×0+3×1+3×1
©JKUAT-SODeL

= 12 + 0 + 3 + 3 = 18

Degrees of Freedom df1 = v1 = k − 1 = 4 − 1 = 3


Mean sum of squares between samples (i.e. blocks)

Σ[nk(x¯k − x̄¯)2 ] 18
= = =6
(k − 1) 3

JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 21
HCBA 3102 STATISTICS FOR BUSINESS
Variance within Samples: Sum of the squares of the de-
viations

= Σ(x1 − x¯1 )2 + Σ(x2 − x¯2 )2 + Σ(x3 − x¯3 )2 + Σ(x4 − x¯4 )2


= 2 + 2 + 6 + 6 = 16
©JKUAT-SODeL

Degree of Freedom df2 = v2 = n − k = 12 − 4 = 8


(x−x¯k )2
P
Mean sum of squares within samples (i.e. blocks) = n−k
=
16
8
=2
M SSB 6
F -Ratio = = =3
M SSW 2
Analysis of Variance

JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 22
HCBA 3102 STATISTICS FOR BUSINESS

Source of variation S.S. d.f. M.S.S. F-Ratio


Between Samples 18 (k − 1) = 3 6 F = M SSB
M SSW
=3
Within samples 16 (n − k) = 8 2
Conclusion
F or v1 = 3, v2 = 8, Ft = 4.07 at 5 level of significance.
©JKUAT-SODeL

Since Fc < Ft , hence the four varieties of wheat are not signif-
icantly different with regard to yield i.e., This difference is due
to fluctuations of sampling.
Varieties

JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 23
HCBA 3102 STATISTICS FOR BUSINESS

I II III IV
2 2 2 2
(10 − 7) = 9 (7 − 7) = 0 (8 − 7) = 1 (5 − 7) = 4
(9 − 7)2 = 4 (8 − 7)2 = 1 (5 − 7)2 = 4 (8 − 7)2 = 1
(8 − 7)2 = 1 (6 − 7)= 1 (5 − 7)2 = 4 (5 − 7)2 = 4
Total 14 2 9 9
©JKUAT-SODeL

Total sum of Squares = 14 + 2 + 9 + 9 = 34


Degree of Freedom = n − 1 = 12 − 1 = 11
2
Total Variation = Σ(x−x̄)
n−1
= 34
11
= 3.09 

Formatting data for Computer based Analysis


Most computer programs that compute t-tests and ANOVA
tests require your data be in a specific form. Consider the fol-
JJ II
lowing data in Table 6.1. The formatted data in a computer file
J I
such as in SPSS should look in Table 6.2.
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 24
HCBA 3102 STATISTICS FOR BUSINESS

Table 6.2: Formatted Varieties Data


IDNo Variety Block Yield (Kg)
1 I 1 10
2 II 1 7
3 III 1 8
4 IV 1 5
©JKUAT-SODeL

5 I 2 9
6 II 2 8
7 III 2 5
8 IV 2 8
9 I 3 8
10 II 3 6
11 III 3 5
12 IV 3 5

JJ II
Notice that the data now consists of three variables and both
J I
One-way-Anova and Two-Way ANOVA can be applied to the
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 25
HCBA 3102 STATISTICS FOR BUSINESS
data. It is always good practice to include a identification num-
ber for each observation.
Example . Consider the following data as captured in SPSS
file.
©JKUAT-SODeL

JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 26
HCBA 3102 STATISTICS FOR BUSINESS
©JKUAT-SODeL

JJ II
J I Suppose we wish to test whether performance in Maths depends
J DocDoc I on School category. One can proceed to perform several t-test
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 27
HCBA 3102 STATISTICS FOR BUSINESS
taking a pair at a time (Boarding vs Day, Boarding vs Mixed and
Day vs Mixed) but this is not recommended. The appropriate
test is One-Way Since gender has only two categories, the test is
simply a comparison of two group means. To determine whether
the differences are by chance, the appropriate test is independent
samples t-test with the hypothesis;
©JKUAT-SODeL

H0 : µm = µf versus H1 : µm 6= µf .

Alternatively a non-mathematician can use statements instead


as follows;
H0 : There is no significant difference between the two groups in
Maths
JJ II H1 : There is no significant difference between the two groups in
J I Maths
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 28
HCBA 3102 STATISTICS FOR BUSINESS
The first table SPSS gives is for descriptives; From table 6.3, it
ia apparent that the mean score for Males is higher than that of
Females. However, this is not conclusive as the test is not about
the sampe data but the population from which the data was
taken. The standard error of the mean is a ratio s.e(x̄) = √Sn
(s is the standard deviation) is a measure of accuracy of the
©JKUAT-SODeL

estimates.

Table 6.3: Anova results 1


JJ II
J I The second table that SPSS gives contains the inferential
J DocDoc I results. Before you report the t-values and the corresponding
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 29
HCBA 3102 STATISTICS FOR BUSINESS
p-values check Levene’s test for equality of variances which the
guides in determining which row to use in reporting the results.

Table 6.4: ANOVA results 2


©JKUAT-SODeL

Since F = 0.251, p = 0.624 > 0.05(5, the Levene’s test tells


us that the variance within males data is similar to the variance
within female scores. This tells us to use the first row (Equal
JJ II
variances assumed). Since t14 = −4.51, p < 0.001, we reject
J I
the null hypothesis and conclude that there is significant differ-
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 30
HCBA 3102 STATISTICS FOR BUSINESS
ence in the means scores of the two groups. The males has a
significantly higher mean score (x̄m = 79.14 against x̄f = 60.11)
.

Exercise 1.  Carry out a similar test based on the perfor-


mance score in Kiswahili and English
©JKUAT-SODeL

JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 31
HCBA 3102 STATISTICS FOR BUSINESS
6.2. Chi-Square Test
Chi-Square Tests are based on Chi-square distribution which has
the following characteristics
• It is positively skewed
• It is non-negative
©JKUAT-SODeL

• It is based on degrees of freedom


• When the degrees of freedom change a new distribution is
created

JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 32
HCBA 3102 STATISTICS FOR BUSINESS
©JKUAT-SODeL

The Chi Square distribution can be used to test whether ob-


served data differ significantly from theoretically expectations.
JJ II For example, for a fair six-sided die, the probability of of any
J I given outcome on a single roll would be 16 The data in Table
J DocDoc I 6.5 were obtained by rolling a six-sided die 36 times. However,
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 33
HCBA 3102 STATISTICS FOR BUSINESS
as can be seen, some outcomes occurred more frequently than
others. For example a "3" came up nine times whereas a "4"
came up only two times. Are these data consistent with the hy-
pothesis that the die is a fair die? One way to test whether the
die is fair is to conduct a significance test. The null hypothesis is
that the die is fair. This hypothesis is tested by computing the
©JKUAT-SODeL

probability of obtaining frequencies as discrepant or more dis-


crepant from a uniform distribution of frequencies as obtained
in the sample. If this probability is sufficiently low, then the
null hypothesis that the die is fair can be rejected.

JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 34
HCBA 3102 STATISTICS FOR BUSINESS

Table 6.5: Outcome Frequencies from a six-sided Die


Outcome Frequency
1 8
2 5
3 9
4 2
©JKUAT-SODeL

5 7
6 5

The first step in conducting the significance test is to com-


pute the expected frequency for each outcome given that the
null hypothesis is true. For example, the expected frequency of
a "1" is 6 since the probability of a "1" coming up is 16 and there
were a total of 36 rolls of the die.
JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 35
HCBA 3102 STATISTICS FOR BUSINESS
Expected frequency (E) is given by;

1
E = ( )(36) = 6
6

Note: The expected frequencies are expected only in a theoret-


ical sense. We do not really "expect" the observed frequencies
©JKUAT-SODeL

to match the "expected frequencies" exactly.


Let Of and Ef be the observed and expected frequencies
respectively. We are looking for significant differences between
the actual or Observed cell frequencies in a table (Of ) and those
that would be Expected by random chance (Ef ). For each case
we compute a chi-square value as

JJ II (Ef − Of )2
J I Ef
J DocDoc I Table 6.6 shows these calculations.
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 36
HCBA 3102 STATISTICS FOR BUSINESS

Table 6.6: Outcome Frequencies from a Six-Sided Die


2
outcme E O (E−O) E
1 6 8 0.667
2 6 5 0.167
3 6 9 1.500
4 6 2 2.667
©JKUAT-SODeL

5 6 7 0.167
6 6 5 0.167

Next we add up all the values in the last column of table6.6


to get

X (Ef − Of )2
JJ II = 0.667 + 0.167 + ... + 0.167 = 5.33
Ef
J I
J DocDoc I This sampling distribution of
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 37
HCBA 3102 STATISTICS FOR BUSINESS

X (Ef − Of )2
Ef
is approximately distributed as Chi-Square on k − 1 degrees of
freedom where k is the number of categories. Therefore, for this
problem the test statistic is
©JKUAT-SODeL

χ25 = 5.333

which means the value of Chi Square with 5 degrees of freedom


is 5.333. From a ChiSquare calculator it can be determined
that the probability of a ChiSquare of 5.333 or larger is 0.377.
Therefore, the null hypothesis that the die is fair cannot be
JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 38
HCBA 3102 STATISTICS FOR BUSINESS
rejected. Then the hypothesis is

H0 : no difference between Of and Ef


H1 : there is a difference between Of and Ef

Example . In a certain country, the Bureau of Statistics has


©JKUAT-SODeL

Census records indicating that 63.9% of the population is mar-


ried, 7.7% widowed, 6.9% divorced (and not re-married), and
21.5% single (never been married). A sample of 500 adults from
the one province showed that 310 were married, 40 widowed, 30
divorced, and 120 single. At the .05 significance level can we
conclude that the this province is different from the rest of the
country?
JJ II
Solution
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 39
HCBA 3102 STATISTICS FOR BUSINESS
Step1: State the Hypothesis

H0 : Of = Ef
H1 : Of 6= Ef

Step 2: Compute the test statistic;


©JKUAT-SODeL

" #
2 (Of − Ef )2
χ =Σ
Ef

This requires us to first compute the Expected and the Chi-


square values as shown in the following table.

JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 40
HCBA 3102 STATISTICS FOR BUSINESS

Status O E (O − E)2 /E
Married 310 319.5 0.2825
Widowed 40 38.5 0.0584
Divorced 30 34.5 0.587
Single 120 107.5 1.4535
©JKUAT-SODeL

Total 500 2.3814

Step 3: Decision rule: We reject the null hypothesis if χ2c >


χ23,0.05 = 7.815 Since this is not the case, we accept the null
hypothesis and conclude that the province is not different from
the rest of the country in terms of marital status.

JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 41
HCBA 3102 STATISTICS FOR BUSINESS
Exercise 2.  The following data on reported crimes was col-
lected from a certain city.

Day Frequency
Monday 89
Tuesday 45
©JKUAT-SODeL

Wednesday 50
Thursday 56
Friday 60

At the .05 level of significance, test to determine whether


there is a difference in the absence rate by day of the week
(Hint: Expected frequency is the average)
JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 42
HCBA 3102 STATISTICS FOR BUSINESS
6.2.1. Contingency Table Analysis
Contingency table analysis is used to test whether two traits or
variables are related.
• Each observation is classified according to two variables.
The usual hypothesis testing procedure is used.
©JKUAT-SODeL

• If Nr and Nc are the number of rows and number of columns


respectively, then the degrees of freedom are (Nr − 1) ×
(Nc − 1)
• Let Tr be the row total, Tc be the column total and Tt be
the overall total for the table then, the expected frequency
is computed as:
Tr × Tc
E=
JJ II Tt
J I That is Expected Frequency = (row total)(column total)/grand
J DocDoc I total
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 43
HCBA 3102 STATISTICS FOR BUSINESS
• The test statistic;
" #
(Of − Ef )2
χ2c = Σ
Ef

but one can use


©JKUAT-SODeL

X O2
χ2c = − Tt
E

for faster computation.


NOTE: Two conditions must be satisfied for this chi-square test
results to be valid; i.e;
1. the expected frequencies must be at least 1 in every cell.
JJ II 2. At least 20% of the cells should have expected frequencies
J I greater or equal to 5.
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 44
HCBA 3102 STATISTICS FOR BUSINESS
To overcome such a problem for tests involving a contingency ta-
ble similar rows/columns are normally combined to ensure that
this condition is satisfied before the test can be carried out.
Exercise 3.  In a study of of relationship between the loca-
tion of an accident and the gender of the person involved in the
©JKUAT-SODeL

accident? A sample of 150 accidents reported to the police were


classified by type and gender as given below.

Gender Work Home Other


Male 60 20 10
Female 20 30 10
At the .05 level of significance, can we conclude that gender
JJ II and the location of the accident are related?
J I Solution
J DocDoc I The hypothesis to be tested is
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 45
HCBA 3102 STATISTICS FOR BUSINESS
H0 : Location of accident and gender not related
H1 : Location of accident and gender related
Let Tr be the row total, Tc be the column total and Tt be
the overall total for the table then, Values in parenthesis are the
expected values per cell computed as
©JKUAT-SODeL

Tr × Tc
E=
Tt

Gender Work Home Other Total


Male 60 (48) 20 (30) 10 (12) 90
Female 20 (32) 30 (20) 10 (8) 60
Total 80 50 20 150
JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 46
HCBA 3102 STATISTICS FOR BUSINESS
The test statistic is
X O2 602 202 102
χ2calc = − Tt = + + ... + − 150 = 16.67
E 48 30 8

The degrees of freedom are (Nr − 1) × (Nc − 1) = 1 × 2 = 2.


Since 16.67 > 5.99 we reject H0 and conclude that the place of
©JKUAT-SODeL

accident is gender related.


Notice that Most of the males are likely to be involved in an
accident at work (ie.Expected=48 but the observed is 60) while female
accident cases are largely at home.
NOTE: Where there is only 1 degree of freedom, the approx-
imation is not reliable if expected frequencies are below 10. In
such a case, reduce the absolute value of each difference between
JJ II observed and expected frequencies by 0.5 before squaring; this
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 47
HCBA 3102 STATISTICS FOR BUSINESS
is called Yates’ correction for continuity. The formula becomes

(|Of − Ef | − 0.5)2
 
χ2c =Σ
Ef

A dentist notices that most people have cavities on the right-


©JKUAT-SODeL

hand side of their mouths. conjectures that this is because they


are right-handed and as a result they clean the left side of their
mouths more thoroughly. To test his hypothesis he selects at
random 30 right-handed and 20 left-handed patients and classi-
fies them further on.
Example . The table shows whether or not the subjects suffered
from heart disease and how their snoring habits were classified by
JJ II
their partners.
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 48
HCBA 3102 STATISTICS FOR BUSINESS
Never Occasionally Snores Snores every night
Heart disease 40 90 120
No heart disease 80 90 80
2
Use a χ test, at the 5% significance level, to investigate whether fre-
quency of snoring is related to heart disease. (10 marks)
Solution: The hypothesis to be tested is
©JKUAT-SODeL

H0 : Snoring and heart disease are not related


H1 : Snoring and heart disease are related
Never Occasionally Snores Snores every night T
Heart disease 40(60) 90(90) 120(100)
No heart disease 80(60) 90(90) 80(60)
Total 120 180 200
Let Tr be the row total, Tc be the column total and Tt be the overall
JJ II total for the table then, Values in parenthesis are the expected values
J I per cell computed as
Tr × Tc
J DocDoc I E=
Tt
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 49
HCBA 3102 STATISTICS FOR BUSINESS
The test statistic is
X (Of − Ef )2 (40 − 60)2 (80 − 100)2
χ2 = = + ... + = 21.334
Ef 60 100

The degrees of freedom are (Nr − 1) × (Nc − 1) = 1 × 2 = 2. Since


we reject H0 and conclude that snoring and heart disease are related.
©JKUAT-SODeL

Since χ2c = 21.334 >χ2t,α = 7.81. We reject the null hypothesis and
conclude that the data provides sufficient evidence to conclude that
the claim is valid. 

Exercise 4.  The table shows whether or not the subjects suf-


fered from heart disease and how their snoring habits were classified
by their partners.
Never Occasionally Snores Snores every night
JJ II Heart disease 50 90 130
J I No heart disease 70 90 70
J DocDoc I 2
Use a χ test, at the 5% significance level, to investigate whether fre-
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 50
HCBA 3102 STATISTICS FOR BUSINESS
quency of snoring is related to heart disease. (10 marks)

6.3. Limitations of Hypothesis Testing


We have described above some important test often used for testing
hypotheses on the basis of which important decisions may be based.
©JKUAT-SODeL

But there are several limitations of the said tests which should al-
ways be borne in mind by a researcher. Important limitations are as
follows:

1. The tests should not be used in a mechanical fashion. It should


be kept in view that testing is not decision - making itself; the
tests are only useful aids for decision - making. Hence “proper
interpretation of statistical evidence is important to intelligent
JJ II
decisions.”
J I
J DocDoc I 2. Test do not explain the reasons as to why does the differences
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 51
HCBA 3102 STATISTICS FOR BUSINESS
exist. They simply indicate whether the difference is due to
fluctuations of sampling or because of other reasons but the
tests do not tell us as to which is / are the other reason(s)
causing the difference.

3. Results of significance tests are based on probabilities and as


©JKUAT-SODeL

such cannot be expressed with full certainty. When a test shows


that the difference is statistically significant, then it simply
suggests that the difference is probably not due to chance.

4. Statistical inferences based on the significance tests cannot be


said to be entirely correct evidences concerning the truth of
the hypotheses. This is specially so in case of small samples
where the probability of drawing erring inferences happens to
JJ II be generally higher. For greater reliability, the size of samples

J I be sufficiently enlarged.

J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 52
HCBA 3102 STATISTICS FOR BUSINESS
All these limitations suggest that in problems of statistical signifi-
cance, the inference techniques (or the test) must be combined with
adequate knowledge of the subject - matter along with the ability of
make good judgment.

Suggested materials for further reading


©JKUAT-SODeL

1. Gujarati, D.N. (2006). Basic Econometrics. 3rd Edition, McGraw-


Hill, Inc., New York.

2. Fruend, J.E. and Williams, F.J. (1979). Modern Business


Statistics. Pitman Publishing Limited, London.

3. Gupta, S.C. and Kapoor, V.K. (1995). Fundamentals of Math-


ematical Statistics. Sultan Chand and Sons, New Delhi.
JJ II 4. Keller, G., Warrack, B. and Bartel, H. (1994). Statistics for
J I Management and Economics. 3rd Edition. Wadsworth Pub-
J DocDoc I lishing Company, Belmont California, USA.
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 53
HCBA 3102 STATISTICS FOR BUSINESS
Solutions to Exercises
Exercise 2. Assume equal expected frequency: 60
Using these numbers, the computed test statistic is 19.7.
The degrees of freedom is (5-1)=4.
Therefore, the critical value is 9.488
Decision: Reject the null hypothesis and conclude that there is a
©JKUAT-SODeL

difference in the absence rate by day of the week.


Exercise 2

JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 54
HCBA 3102 STATISTICS FOR BUSINESS
Exercise 4. The hypothesis to be tested is
H0 : Snoring and heart disease are not related
H1 : Snoring and heart disease are related
Never Occasionally Snores Snores every night
Heart disease 50(64.8) 90(97.2) 130(108)
No heart disease 70(55.2) 90(82.8) 70(55.2)
©JKUAT-SODeL

Total 120 180 200


Let Tr be the row total, Tc be the column total and Tt be the overall
total for the table then, Values in parenthesis are the expected values
per cell computed as
Tr × Tc
E=
Tt
The test statistic is
X (Of − Ef )2 (50 − 64.8)2 (70 − 92)2
JJ II χ2 = = + ... + = 18.249
Ef 64.8 92
J I
J DocDoc I The degrees of freedom are (Nr − 1) × (Nc − 1) = 1 × 2 = 2. Since
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 55
HCBA 3102 STATISTICS FOR BUSINESS
we reject H0 and conclude that snoring and heart disease are related.
Since χ2c = 18.249 >χ2t,α = 7.81. We reject the null hypothesis and
conclude that the data provides sufficient evidence to conclude that
the claim is valid.
Exercise 4
©JKUAT-SODeL

JJ II
J I
J DocDoc I
Back Close JKUAT, Setting trends in higher Education, Research and Innovation 56

You might also like