STAT1371 Topic11 PDF

STAT1371: Statistical Data Analysis
Topic 11: Categorical Data Analysis

//
Semester 2, 2020
//
/
Contents
I Testing Categorical Data
I The Goodness of fit test

I With and without unknown parameters
I χ2 distribution
I Two-way classification table/contingency table
I Tests for Independence
DEPARTMENT OF MATHEMATICS & STATISTICS 2

Testing Categorical Data
I In many applications, data are simply classified into distinct

categories.
I These categories need not have a natural numerical ordering.
I For example, in an experiment involving a dihybrid cross of flies, 148

progeny were classified by phenotype as follows:
AB Ab aB ab Total
87 31 25 5 148

I We want to evaluate whether data resemble a particular distribution.
I For instance, theory predicts a ratio of 9:3:3:1 for AB:Ab:aB:ab.
I In terms of probability, the ratio is 9 3 3 1

16 : 16 : 16 : 16 .
I If we use groups 1 to 4 to represent the 4 phenotypes AB, Ab, aB, ab

respectively, then the probability model can be represented as
9 3 3 1
p1 = , p2 = , p3 = , p4 = .
16 16 16 16
where pi is the probability of an observation to fall into group i, such
that
Xg
pi = 1.
i=1
I Do the data support the theory?

Motivational setting
I We learned how to deal with this when there are only two categories
back in Topic 9.
I Suppose we have n independent trials with X success and n − X

failures, i.e. X ∼ B(n, p).
I We wish to test H0 : p = p0 against H1 : p 6= p0 . If H0 is true then
Success Failure
Observe O1 = X O2 = n − X
Expect E1 = np0 E2 = n(1 − p0 )
I Recall from Topic 9, large value of |X − np0 | support H1 .
I Assuming both np0 , n(1 − p0 ) > 5 so that normal approximation is

appropriate, then large values of
(X − np0 )2
τ= , support H1 .
np0 (1 − p0 )
I Notice that
(O2 − E2 )2 = [(n − X − (n − np0 ))]2

= (X − np0 )2 = (O1 − E1 )2 .
I Also
1 1 1
+ = .
np0 n(1 − p0 ) np0 (1 − p0 )
I Thus
(X − np0 )2

1 1
τ= = (X − np0 )2 +
np0 (1 − p0 ) np0 n(1 − p0 )
2
(O1 − E1 ) (O2 − E2 )2
= +
E1 E2
I This is a special case of Pearson’s χ2 statistic.

Pearson’s χ2 Goodness of Fit test
I Assume we have g categories, not just success/failure and H0

specifies a model giving expected frequencies for each category.
I For instance, we could be testing:
9 3 3 1
H0 : p1 = , p2 = , p3 = , p4 = , against H1 : not H0 ,
16 16 16 16
I The alternative hypothesis is that the true proportions are not all as
specified.
I It is not necessary that all proportion are different, only that at least
one is not as specified for H0 to be false.
I We can test the claim by comparing the observed frequency with the
expected frequency under H0 .

I The Pearson’s χ2 test-statistic (without continuity correction) is
g
X (Oi − Ei )2
τ= ,
i=1
Ei
where
I Oi is the observed frequency in the ith category;
I Ei = npi is the expected frequency if H0 is true;
I n=
P P
Oi = Ei is the total number of observations.

I If the data support the proposed model, then we expect (Oi , Ei ) to
be close together and (Oi − Ei )2 wouldn’t be too large.
I If the data do not support the proposed model, then we expect at

least one (Oi , Ei ) would be far apart and (Oi − Ei )2 would be large.
I We reject the model if the χ2 test-statistic is too large.
I The sampling distribution of the statistic has (asymptotically) a

chi-squared distribution with g − 1 degrees of freedom.
P-value = P(χ2g−1 ≥ τobs ).
I Note: chi is pronounced as KI.
I The χ2 test should only be used when the expected frequencies Ei are
all greater than 5.
I Recall this is consistent to np > 5 for the normal approximation to the
binomial!

Continuity correction
I As we are observing count/frequency, categorical data are obviously
discrete.
I However, the test involves comparing the test statistic (discrete) to a

continuous distribution.
I Therefore we should use the continuity correction, and the test

statistic of the χ2 Goodness of Fit test becomes:
g 1 2

X |Oi − Ei | − 2
τ= .
i=1
Ei
I This is also known as the Yates’ continuity correction.
I Note that the continuity correction is always going to make the test
statistic smaller, and the test becomes more conservative.
I In some rare cases that the fit is exceptionally well, i.e. |Oi − Ei | ≤ 1 ,
2
then the correction will be reduced so that the correction will not be
bigger than the differences themselves.
The χ2 distribution
I A χ2 random variable can only take non-negative values.
I The distribution is not symmetric but is right-skewed:

Chi−squared density with dfs 1, 2, 4 and 9
0.6
df
0.4 1
f(x)
2
4
9
0.2
0.0
0 5 10 15 20 25
x
I Fun facts:
I χ21 = Z 2 , where Z ∼ N(0, 1);
I If X ∼ χ2ν then E(X ) = ν and Var(X ) = 2ν.
The χ2 Table
I Our principal interest in the χ2 distribution is the calculation of
P-values of the Goodness of Fit test.
I The χ2 Tables typically give

P(χ2ν ≥ x ) = p(= 1-pchisq(x, nu) in R.)
where ν is the degrees of freedom (row), p is the upper tailed
probabilities (column, top header) and x is given in the body of the
table.
I You can find the chisq-table.pdf under Table section on iLearn.
I In R, the following functions are helpful:

I pdf: dchisq(x, df = nu);
I cdf: pchisq(q, df = nu);
I quantile or critical values: qchisq(p, df = nu);
I random numbers: rchisq(n, df = nu).

Examples of using χ2 Table
i)
P(χ21 > 3.841) = 0.05. (Note 1.962 = 3.841, P(|Z |2 > 1.962 ) = 0.05.)
ii) From the table, we get

0.025 < P(χ210 > 20) < 0.05
This can be confirmed in R:
1-pchisq(20, 10)
# [1] 0.02925269
iii)
P(13.85 < χ224 < 39.36) = 0.925

Back to the phenotypes example
I It was an experiment involving a dihybrid cross of flies, 148 progeny
were classified by phenotype as follows:
group 1 2 3 4 Total
phenotype AB Ab aB ab
Oi 87 31 25 5 148
I We are testing
9 3 3 1
H0 : p1 = , p2 = , p3 = , p4 = againstH1 : not H0 .
16 16 16 16
I Under H0 , the model specifies the following expected frequencies
group 1 2 3 4 Total
9 3 3 1
Ei 16 ×148 = 16 ×148 = 16 ×148 = 16 ×148 = 148
83.25 27.75 27.75 9.25
I The test statistic of the test is
4 2
X |Oi − Ei | − 12
τ= ∼ χ2g−1 = χ23 , under H0 .
i=1
E i
I The χ2 test is valid as all Ei > 5.
I The observed value of the test statistic is

2 2
|87 − 83.25| − 12 |31 − 27.75| − 12
τobs = +
83.25 27.75
1 2
2
|5 − 9.25| − 12

|25 − 27.75| − 2
+ +
27.75 9.25
= 0.1269 + 0.2725 + 0.1824 + 1.5203
= 2.1021.
I The P-value for testing the fit of the model is

P-value = P(χ23 ≥ 2.1021) > 0.1.
I Since the P-value is large, we conclude that the data are consistent
with H0 , i.e the observed ratio is not significantly different from
9:3:3:1.
Car Accidents Example
I The number of fatal accidents in NSW roads in months with 31 days

in 1993 were:
Jan Mar May July Aug Oct Dec

44 56 37 42 59 59 63
I Test the claim that the accident rate is the same for all months.
I Let pi denotes the probability that a fatal accident is “allocated” to

month i.
I We are testing:
1
H0 : pi = , i = 1, 2, . . . , 7, against H1 : not H0 .
7
I The total number of accidents is 360. Thus Ei = 360

7 = 51.43.

7 2
X |Oi − Ei | − 12
τ= ∼ χ2g−1 = χ26 , under H0 .
i=1
E i
I We can present the information in a table
Month Jan Mar May July Aug Oct Dec Total

Oi 44 56 37 42 59 59 63 360
Ei 51.43 51.43 51.43 51.43 51.43 51.43 51.43 360
(|Oi −Ei |− 21 )2
Ei
0.93 0.32 3.77 1.55 0.97 0.97 2.38 10.89
I This means τobs = 10.89. Thus
P-value = P(χ26 ≥ 10.89) > 0.1.
I As the P-value is large, data is consistent with H0 , i.e. there is not

enough evidence to conclude the accident is not constant across the
months based on these data.

Further application of χ2 Goodness of Fit Test
I The test so far assume H0 is fully specified, i.e. they are determined
by some outside consideration before the data are investigated.
I If we want to check the fit of a model that involves unknown

parameters, we first have to estimate the parameters with the data.
I Since we use the same data to estimate the parameters and test the
fit, we find the sampling distribution of the χ2 test statistic has to be
adjusted.
I The sampling distribution is still χ2 but the dfs are reduced to

g − k − 1, where
I g is the number of categories
I k is the smallest number of parameters that need to be estimated

using the data.

Example: More on phenotypes
I In a backcross experiment to investigate the genetic linkage between
two factors A and B in a species of flower, some researchers classified
400 offspring by phenotype as follows:
AB Ab aB ab
128 86 74 112
i) Under the no linkage model, the four phenotypes are equally likely.
Show that this model is a poor fit.
ii) If linkage is in the coupling phase, the probability of the four
phenotypes are
AB Ab aB ab
1 1 1 1
2
(1− p) 2
p 2
p 2
(1− p)
where p is the recombination fraction and is estimated by the overall

proportion of Ab and aB. Show that this model fits the data well.
Example part i)
I Model says that all categories are equally likely.
I We are testing
H0 : pi = 1/4, i = 1, . . . , 4, against H1 : not H0 .

4 1 2

X |Oi − Ei | −
τ= 2
∼ χ2g−1 = χ23 , under H0 .
i=1
Ei
I The test can be summarised into the following table
AB Ab aB ab Total
Oi 128 86 74 112 400
Ei 100 100 100 100 400
(|Oi −Ei |− 12 )2
Ei
7.5625 1.8225 6.5025 1.3225 17.21

I As all Ei > 5, χ2 test is valid.
I From the table, we have τobs = 17.21 and

P-value = P(χ23 ≥ 17.21) < 0.005.
As the P-value is small, there is evidence to reject H0 i.e. data are not
consistent with the model.

Example part ii)
I We are testing
1−p p p 1−p
H0 : p1 = , p2 = , p3 = , p4 = , against H1 : not H0 .
2 2 2 2
I Here we estimate p with p̂ = 86+74 400 = 0.4.
4 2
X |Oi − Ei | − 12
τ= ∼ χ2g−1−1 = χ22 , under H0 .
i=1
E i
I We lose another df as we estimated an extra parameter from the data.

I The test can be summarised into the following table
AB Ab aB ab Total
Oi 128 86 74 112 400
Ei 120 80 80 120 400
(|Oi −Ei |− 12 )2
Ei
0.46875 0.378125 0.378125 0.46875 1.69375
I As all Ei > 5, χ2 test is valid.
I From the table, we have τobs = 1.69375 and
P-value = P(χ22 ≥ 1.69375) < 0.1.
I As the P-value is large, data are consistent with H0 , i.e. there is no

significant difference between the proposed model and the data.

Example: Infections
I 200 groups of 5 insects each were inspected.
I For each group the number of infected insects (x ) was counted:

3, 2, 5, 1, 0, . . . , 2.
I The data were condensed into the table below, writing xi for the
number of infected and fi for the corresponding frequency:
xi 0 1 2 3 4 5 Total
fi 20 62 55 38 20 5 200
I Does the binomial model fit the data?

I We are testing:
H0 : X ∼ B(5, p) against H1 : not H0 .
I We need to estimate p.
I There were 5 × 200 = 1000 insects in total and 391 of these were
infected, i.e. an estimate for p would be
391
p̂ = = 0.391.
1000
I The test can be summarised into the following table:
i 0 1 2 3 4 5 Total
pi 0.0838 0.2689 0.3453 0.2217 0.0712 0.0091 1
Oi 20 62 55 38 20 5 200
Ei = npi 16.76 53.78 69.06 44.34 14.24 1.82 200
where
5
pi = (0.391)i (1 − 0.391)5−i , i = 0, . . . , 5.
i
- Wait a minute! One of the cells has an expected value < 5! The χ2 test
isDEPARTMENT
not valid!
OF MATHEMATICS & STATISTICS 25
I If any Ei value falls below 5, we can
I get a larger n i.e. get more sample
I "pool" classes (combine counts) in a sensible way
I We can then carry out the χ2 test with one fewer category (and also
one less df).
I Here we will combine the last two cells together to get a single
category for ≥ 4 and the table becomes:
i 0 1 2 3 ≥4 Total
pi 0.0838 0.2689 0.3453 0.2217 0.0803 1
Oi 20 62 55 38 25 200
Ei = npi 16.76 53.78 69.06 44.34 16.06 200
(|Oi −Ei |− 12 )2
Ei
0.4479 1.1082 2.6625 0.7692 4.4355 9.4233

5 2
X |Oi − Ei | − 12
τ= ∼ χ2g−1−1 = χ23 , under H0 ,
i=1
E i
as g = 5 now.
I Hence τobs = 9.4233 and

P-value = P(χ23 ≥ 9.4233) < 0.025.
I As the P-value is small, we have evidence against H0 , i.e. there are

significant difference between the proposed binomial model and the
data.
I A similar procedure can be used to test the fit of other discrete

distribution such as Poisson and negative binomial.

Testing the fit of a normal model
I Given a dataset x1 , x2 , . . ., xn we want to test if the data come from

a N(µ, σ 2 ) population.
1) We first calculate the sample mean, x , and the sample variance, s 2 .
2) Form a grouped frequency table and summarise the data with

(ideally) 5 to 10 categories.
I Aim to have at least 5 values in each category.
3) To check against normal population, work out the expected

frequencies for each category by fitting N(x , s 2 ).
4) Calculate χ2 test statistic as usual.
5) To calculate the P-value, use g − 2 − 1 df.

Example: Rainfall
I We have n = 30 observations corresponding to Sydney’s annual

rainfall (in inches) from 1941-1970 (from the 1972 Australian Year
Book):
26.74 48.29 50.74 31.04 46.47 36.05

41.45 38.83 66.26 86.63 53.15 59.19
40.86 41.29 72.46 67.33 27.13 59.19
59.67 51.01 57.08 44.90 80.11 43.30
36.01 48.40 52.78 24.56 56.94 43.42
I Test if the rainfall follows a normal distribution.
I We are testing:
H0 : X ∼ N(µ, σ 2 ), against H1 : not H0 .
I We estimate µ and σ 2 with x = 49.71 and s 2 = 229.15. respectively.

b) Grouping the data into a frequency table:
Interval x ≤ 40 40 < x ≤ 50 50 < x ≤ 60 x ≥ 60 Total

Frequency 7 9 9 5 30
c) We now calculate the expected frequencies using
X ∼ N (49.71, 229.15).
Then

40 − 49.71
p1 = P(X ≤ 40) = P Z ≤ √ = P(Z ≤ −0.64) = 0.2611.
229.15
I Thus E1 = 30 × 0.2611 = 7.833.

I To calculate p2 :
p2 = P(40 < Y ≤ 50) = P(−0.64 < Z ≤ 0.019) = 0.2469
E2 = 30 × 0.2469 = 7.407.
Similarly,
E3 = 30 × 0.2437 = 7.311 and
E4 = 30 − 7.833 − 7.407 − 7.311 = 7.449.
d)

4 2
X |Oi − Ei | − 12
τ= ∼ χ2g−2−1 = χ21 , under H0 .
i=1
E i
I The χ2 test is valid as all Ei > 5.

I The observed value of the test statistic is
2 2
|7 − 7.833| − 12 |9 − 7.407| − 12
τobs = +
7.833 7.407
2 2
|9 − 7.311| − 12 |5 − 7.449| − 12

+ +
7.311 7.449
= 0.0142 + 0.1613 + 0.1934 + 0.5099
= 0.8788.
I Here g = 4 and k = 2 so we have 1 d.f.
I The P-value is P(χ21 ≥ 0.8788) = 0.204 > 0.1, with R.
I As the P-value is large, data are consistent with H0 i.e. data are
consistent with the normal model.

I This procedure can be modified to test the goodness of fit of other
continuous distribution such as exponential & gamma.
I The procedure is not unique as the number of categories is not fixed,

and there are also many ways to define the boundary of these
categories.
I The procedure is computationally intensive, but it is good to recycle

an existing procedure to test new things.

Tests for independence
I If we have data classified according to two attributes, then we can
construct a contingency table or a two-way classification table which
is a convenient way of presenting the group frequencies.
I For example, we have data on 422 drivers and motorcyclists killed in

NSW in 1988. We classify the people by blood alcohol level and sex.
Alc (g/100ml) 0 (0, 0.08) [0.08, 0.15) ≥ 0.15 Total

Male 206 37 35 76 354
Female 53 5 4 6 68
Total 259 42 39 82 422
I Test the claim that gender affects blood alcohol level, i.e. testing
whether the two categorising variables dependent (versus
independent)?
I We would be testing:
H0 : the two variables are independent against H1 : not H0 .
A probability model for contingency tables
I Recall from Topic 2 that independence means that the joint

probabilities equal the product of the marginal probabilities, that is
P(X = x , Y = y ) = P(X = x ) · P(Y = y )
I Let pij denote the probability of a victim being sex i and alcohol level
group j then the independence model says:
pij = pi· × p·j , where
I pi· is the prob. of being of sex i, i = 1, . . . , r with r = 2
I p·j is the prob. of being in alcohol group j, j = 1, . . . , c with c = 4.
I r and c represent the number of rows and columns in the contingency

table.

I We will use the following notation:
I Oij , observed number of being of sex i and alcohol level group j;
Pc
I Oi· = Oij observed number in row i, i.e. row marginal total
j=1
Pr
I O·j = Oij observed number in column j, i.e. column marginal
i=1
total
I We estimate pi· and p·j by the marginal proportions, i.e.

Oi· O·j
p̂i· = , p̂·j =
n n
I If H0 is true, that is if row and column variables are independent,
then the expected number Eij in cell (i, j) (that is, row i, column j, or
in our context, Sex i and alcohol level group j), can be estimated
Oi· O·j Oi· × O·j
Eij = np̂i· p̂·j = n × =
n n n
(row i total) × (column j total)
=
table total

I The expected frequencies for the accident data are
Sex/Alcohol Level 0 (0, 0.08) [0.08, 0.15) >=0.15

Male 217.265 35.232 32.716 68.787
Female 41.735 6.768 6.284 13.213
For instance,
259 × 354
E11 = = 217.265;
422
39 × 68
E23 = = 6.284.
422
I The test statistic for this test is

r X c 1 2

X |Oij − Eij | −
τ= 2
∼ χ2(r −1)(c−1) , under H0 .
i=1 j=1
Eij
I We lose 1 df for each factor because we have used the marginal totals
P P
in calculating the expected values (i.e. p̂i· = 1 = p̂·j )
I For χ2 test to be valid, it still requires all Eij > 5.

I There are generally two ways to organise all these calculations:
1 2
I Separate table for Oij , Eij and (|Oij −Eij |− 2 )
Eij
I Put all info in a single table but each cell has
Oij
(Eij )
2
(|Oij −Eij |− 21 )
Eij
I We will do the former here and will provide an example of the latter
in the SGTA.
2
I If we calculate ( ij E ij 2 ) for each cell, we would get
|O −E |− 1
ij
Sex/Alcohol Level 0 (0, 0.08) [0.08, 0.15) >=0.15

Male 0.5334 0.0456 0.0973 0.6551
Female 2.7767 0.2376 0.5065 3.4106
I For instance,
2 2
|O11 − E11 | − 21 |206 − 217.265| − 12
= = 0.5334;
E11 217.265
2 2
|O23 − E23 | − 21 |4 − 6.284| − 12
= = 0.5065.
E23 6.284
As a result,
X X |Oij − Eij | − 1 2

2
τobs = = 8.2628.
i j
Eij
I In this dataset we have r = 2 and c = 4 so the df for the χ2 test is

(2 − 1)(4 − 1) = 3.
I This means that

P-value = P(χ23 ≥ 8.2628) ∈ (0.01, 0.025)
I As P-value is small, we have evidence against H0 , i.e. there is strong

evidence to suggest blood alcohol level and sex are related in accident
victims.
Can we visualise the dataset?
I We will need to create cluster bar charts with the geom_bar()

function.
I See Topic 1 for more details.
library(patchwork)
dat <- data.frame(death = c(206, 37, 35, 76, 53, 5, 4, 6),
sex =rep(c("Male", "Female"), c(4,4)),
alcohol =
factor(rep(c("0", "(0, 0.08)", "[0.08, 0.15)", ">= 0.15"),2),
levels =c("0", "(0, 0.08)", "[0.08, 0.15)", ">= 0.15")))
p1 <- ggplot(data=dat) +
geom_bar(aes(x = alcohol, fill=sex, y = death),
stat="identity", position = "dodge")
p2 <- ggplot(data=dat) +
geom_bar(aes(x = sex, fill=alcohol, y = death),
stat="identity", position = "dodge")
p1+p2

200 200
150 150
alcohol
sex 0
death
death
Female (0, 0.08)
100 100
Male [0.08, 0.15)
>= 0.15
50 50
0 0
0 (0, 0.08)[0.08, 0.15)>= 0.15 Female Male

alcohol sex
I It is evident that Male has a much higher frequency in the ≥ 0.15 of

blood alcohol level than Female group.
I The second point is a bit a more subtle: Female has a higher than
expected frequency in the zero alcohol level group.

STAT1371 Topic11 PDF

Uploaded by

Copyright:

Available Formats

You might also like

STAT1371 Topic11 PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAT1371 Topic11 PDF

Uploaded by

Copyright:

Available Formats

STAT1371: Statistical Data Analysis

Topic 11: Categorical Data Analysis

I Testing Categorical Data

I The Goodness of fit test

I Two-way classification table/contingency table

I Tests for Independence

DEPARTMENT OF MATHEMATICS & STATISTICS 2

I In many applications, data are simply classified into distinct

I These categories need not have a natural numerical ordering.

I For example, in an experiment involving a dihybrid cross of flies, 148

DEPARTMENT OF MATHEMATICS & STATISTICS 3

I For instance, theory predicts a ratio of 9:3:3:1 for AB:Ab:aB:ab.

I In terms of probability, the ratio is 9 3 3 1

I If we use groups 1 to 4 to represent the 4 phenotypes AB, Ab, aB, ab

I Do the data support the theory?

DEPARTMENT OF MATHEMATICS & STATISTICS 4

I Suppose we have n independent trials with X success and n − X

I We wish to test H0 : p = p0 against H1 : p 6= p0 . If H0 is true then

I Recall from Topic 9, large value of |X − np0 | support H1 .

I Assuming both np0 , n(1 − p0 ) > 5 so that normal approximation is

(O2 − E2 )2 = [(n − X − (n − np0 ))]2

I This is a special case of Pearson’s χ2 statistic.

DEPARTMENT OF MATHEMATICS & STATISTICS 6

I Assume we have g categories, not just success/failure and H0

I For instance, we could be testing:

DEPARTMENT OF MATHEMATICS & STATISTICS 7

I Ei = npi is the expected frequency if H0 is true;

DEPARTMENT OF MATHEMATICS & STATISTICS 8

I If the data do not support the proposed model, then we expect at

I We reject the model if the χ2 test-statistic is too large.

I The sampling distribution of the statistic has (asymptotically) a

P-value = P(χ2g−1 ≥ τobs ).

I Note: chi is pronounced as KI.

DEPARTMENT OF MATHEMATICS & STATISTICS 9

I However, the test involves comparing the test statistic (discrete) to a

I Therefore we should use the continuity correction, and the test

I This is also known as the Yates’ continuity correction.

I The distribution is not symmetric but is right-skewed:

I The χ2 Tables typically give

I You can find the chisq-table.pdf under Table section on iLearn.

I In R, the following functions are helpful:

I cdf: pchisq(q, df = nu);

I quantile or critical values: qchisq(p, df = nu);

I random numbers: rchisq(n, df = nu).

ii) From the table, we get

DEPARTMENT OF MATHEMATICS & STATISTICS 13

I The χ2 test is valid as all Ei > 5.

I The observed value of the test statistic is

I The P-value for testing the fit of the model is

I The number of fatal accidents in NSW roads in months with 31 days

Jan Mar May July Aug Oct Dec

I Let pi denotes the probability that a fatal accident is “allocated” to

I The total number of accidents is 360. Thus Ei = 360

DEPARTMENT OF MATHEMATICS & STATISTICS 16

I We can present the information in a table

Month Jan Mar May July Aug Oct Dec Total

I This means τobs = 10.89. Thus