Statistics and Probability

Two-way Contingency Tables Analysis
In this topic 2, we mainly focus on establishing whether an association

exists between categorical variables, and if so, how to quantify using
Relative Risk or Odds Ratio.
A Contingency Table
Suppose X and Y are two categorical variables where X has I categories, and Y has
J categories. In this case, we display the IJ possible combinations of outcomes in a
rectangular table having I rows for the categories of X and J columns for the
categories of Y. Thus, a table in which the cells contain frequency counts of
outcomes is called a contingency table.
Example: Belief In Afterlife Data
The table below is cross-classification of Belief in Afterlife by Gender.

Belief in Afterlife
Gender Yes No or Undecided
Female 435 147
Male 375 134
Source: Data from 1991 US General Social Survey
Note that a contingency table that cross-classifies two categorical variables is

called a two-way table, and a table which cross-classifies three variables is called a
Three- way table. In general, a two-way table having I rows and J columns is
called an I x J table.
Basic concepts, notations and definitions
We define πij = P[X = i; Y = j] as the probability that (X, Y ) falls in the cell in row
i and column j. The probabilities π ij form the joint distribution of X and Y. Note
that,
I J
∑ ∑ π ij=1
i=1 j=1
Marginal Distributions
1|Page
The marginal distribution of X is π i+¿ ¿, which is obtained by the row sums, that is,
π J
i+¿= ∑ πij ¿ , which represents the ith row total while the marginal distribution of Y is
j=1
π + j , which is obtained by the column sums, that is,
I
π + j =∑ π ij , which represents the jth row total. For example, consider a 2 x 2 table
i=1
we have
π i+¿=π 11 +π 12 ¿ , π + j =π 21+ π 22
Notations For The Data
Suppose we have sample data displayed in a contingency table such as the cross-
classification of belief in afterlife above. The observed values in the table are
represented by nij , that is, the cell counts are denoted by with
I J
n=∑ ∑ nij . Remember that I and J are the total number of rows and columns,
i=1 j=1
respectively. The corresponding cell proportions are given by

nij
pij = , where n is the sample size (the total number of observed values in a
n
contingency table). Therefore, marginal frequencies are row totals ni +¿¿ and column
totals n+ j.
Example: Notation for Belief In Afterlife Table
Belief in Afterlife
Gender Yes No or Undecided Total
Female n11=¿ 435 n12 =¿ 147 n1 +¿=582¿
Male n21=¿ 375 n22=¿ 134 n2 +¿=509 ¿
Total n+1 =810 n+2 =281 n=1091
2|Page
Example: Sample Proportions
Belief in Afterlife
Gender Yes No or Total
Undecided
Female p11=¿435/1091=0.398 p12=0.135 p1+¿=0.533 ¿
Male p21=0.344 p22=0.123 p2+¿=0.467 ¿
Total p+1=¿ 0.742 p+2=¿ 0.258 p=1.000
Conditional Probabilities
Let Y be a response variable and X be an explanatory variable. It is informative to

construct separate probability distributions for Y at each level of X. Such a
distribution consists of conditional probabilities for Y given the level of X and is
called a conditional distribution.
Example: Sample Conditional Distributions
In the table for Belief in Afterlife, the response variable is belief in afterlife (usual
represented by the column variable) and gender is an explanatory variable (usual
represented by the row variable). Now we wish to study the conditional
distribution distributions of belief in the afterlife, given gender.
For females,
Proportion of yes responses =435/582= 0.747
Proportion of no responses = 147/582=0.253
For males,
Proportion of yes responses = 0.737
Proportion of no responses = 0.263
Independence
To establish whether a relationship/association exists between belief in the afterlife

and gender same as asking “Is the belief in afterlife is independent of gender?”
In general, two variables are statistically independent if all joint probabilities equal
the product of their marginal probabilities, i.e.
π ij =π i+¿ π ¿, for i = 1,…, I and j = 1,…, J.
j+ ¿¿
3|Page
Note that independence means that the conditional distributions of Y are identical
at each levels of X.
Probability Models For A 2 x 2 Table
The sampling models described in Topic 1 extend to cell counts in contingency

tables. For example,
 Poisson Sampling Model

o This model treats each of the 4 cell counts as an independent Poisson
random variable.
 Binomial Sampling Model

Applies when
o Marginal totals of X are fixed, and
o Conditional distributions of Y at each level of X are binomial.
 Multinomial Sampling Model

This model applicable when the following conditions are met in a 2x2 table:
o Total sample size is fixed but not the row or column totals.
o The distribution of 4 cell counts are then multinomial
Comparing Proportions In 2 x 2 Tables
Suppose that the row totals are fixed (look at the table for cross-classification of
Aspirin use and Myocardial Infarction below!) and hence we have a binomial
model. Now assume that the two categories of Y are success and failure. Let π 1 be
Probability of success in row 1 and π 2 be Probability of success in row 2. The
difference in probabilities π 1−π 2 compares the success probabilities in two rows.
This measure is used to quantify the strength of association between variables.
Sample Difference of Proportions

Let p1 and p2 be sample proportions of success for the two rows. Then the sample
difference written as p1 - p2 are used to estimate the population proportion
difference π 1−π 2.
4|Page
If the counts in two rows are independent samples, the estimated standard error of
p1 - p2 is given by
√
p1 ( 1− p1 )
σ^ ( p1 −p 2) = ¿
p2 ( 1− p2 )
n1+¿ + ¿
n2+¿
Example: Belief in Afterlife
Estimate the
i) difference of proportions,
ii) estimate its standard error
iii) compute the 95% confidence interval for the true difference of proportions.
We let p1 and p2 denote the sample proportions of successes for the two rows. The
sample proportions are computed as
p1 = n11/n1+ = 435=582 = 0.747

p2 = n21/n2+ = 375=509 = 0.737
Therefore, the sample difference of proportions is p1 - p2 = 0.747 – 0.737 = 0.010
The estimated standard error (important in computation of confidence interval)
√
p1 ( 1− p1 )
σ^ ( p1 −p 2) = ¿
p2 ( 1− p2 ) 0.02656
n1+¿ + =¿ ¿
n2+¿
A 95% confidence interval for the true π 1−π 2 is given by
( p 1− p 2) ± z α /2 σ^ ( p1− p2 )
where
z α / 2=z 0.05 /2=1.96
5|Page
Confidence Interval
A large sample 100(1-α)% confidence interval for π 1−π 2 is given by
( p1− p2 )± z α / 2 σ^ ( p 1− p2 )
where z α / 2 denotes the standard normal percentile having a right tail probability
equals to α/2 .
Inference using the confidence interval for the true difference in proportions
Always check if the constructed interval includes the value 0 (zero) then it means
that there is no difference between the proportions i.e. the two variables are
independent (not associated).
Example: Aspirin Use and Myocardial Infarction (MI)

Myocardial Infarction
Group Yes No Total
Placebo 189 10845 11034
Aspirin 104 10933 11037
i) Construct a 95% confidence interval for the difference of proportions, and

interpret.
ii) Determine whether regular intake of Aspirin reduces mortality from
cardiovascular diseases.
Solution
First, we need to compute the proportion in each group as follows

 For placebo group, the proportion of MI cases is given by
p1=189/11034=0.0171
and
 for Aspirin group, the proportion of MI cases is given by
p2=104/11037=0.0094
6|Page
Next, we determine the difference of proportions i.e. p1− p2=0.0077 and the
estimate of its standard error which is
√
σ^ ( p1 −p 2) =
0.0171× 0.9829 0.0094 × 0.9906 0.0015
11034
+
11037
=¿
Therefore, a 95% confidence interval (CI) for the true π 1−π 2 is given by
( p1− p2 )± z α / 2 σ^ ( p 1− p2 )
¿¿
= (0.005, 0.011)
Interpretation
The 95% confidence interval for the true difference of proportions lies between
0.005 and 0.011.
To determine whether regular intake of Aspirin reduces mortality from

cardiovascular diseases, we infer from the confidence interval. Note that a 95% CI
for the true difference of proportions
i) does not include the value 0, which means that the proportion for the
placebo group is NOT the same as that of the aspirin group.
ii) contains only positive values, implying that π 1 >π 2 which suggests that taking
aspirin appears to diminish the risk of MI.
Remark
A difference between two proportions of a certain fixed size may have greater
importance when both proportions are near 0 or 1 than when they are near the
middle of the range. For example, the difference between 0.010 and 0.001 is the
same as the difference between 0.410 and 0.401, namely 0.009 but the first
difference seems more important than the later since ten times as many subjects
have adverse reactions with one drug as the other.
Relative Risk (RR)
This is another measure that used to quantify the strength of association between
variables.
7|Page
In 2 x 2 tables, the relative risk is the ratio of the success probabilities for the two
groups π 1 / π 2.
The proportions 0.010 and 0.001 has a relative risk of 10.0 whereas the proportions
0.410 and 0.401 have a relative risk of 1.02.
Note that a relative risk of 1.0 occurs when π 1=π 2, indicates that the response is
independent of the group.
Sample relative risk = p1 / p 2. Its distribution is heavily skewed and cannot

approximated by normal distribution well unless the sample sizes are quite large.
A large sample confidence interval is given by
exp ¿

Group Yes No Total
Placebo 189 10845 11034
Aspirin 104 10933 11037
i) Calculate and interpret the sample relative risk of MI.

ii) Conduct an inferential analysis for these data. Interpret the results.
The sample relative risk is p1 / p 2= 1.818. This means that the sample proportion of
MI cases was 82% higher for the group taking placebo.
Inference
A large sample 95% confidence interval for the true relative risk π 1 / π 2 is [1.4330,
2.3059].
Interpretation: We are 95% confident that the proportion of MI cases for those
taking placebo is between 1.43 and 1.30 times the proportion of those taking
aspirin.
Note
 This confidence interval for the relative risk indicates that the risk of MI is at
least 43% higher for the placebo group.
8|Page
 The C.I. (0.005, 0.011) for the difference of proportions, π 1−π 2, makes it seem
as if the two groups differ by a trivial amount, but the relative risk shows that
the difference may have important public health implications. Hence, using the
difference of proportions alone to compare to compare two groups can be
somewhat misleading when the proportions are BOTH close to zero.
 It is sometimes informative to compute also the ratio of “failure” probabilities,
(1−π 1 )/(1−π 2 ). This takes a different value than the ratio of the success
probabilities. When one of the two outcomes has small probability, normally
one computes the ratio of the probabilities for that outcome.
Odds Ratio (OR)
This is another measure that used to quantify the strength of association between
variables.
In 2x2 tables, suppose the probability of “success” is π 1 in row 1 and π 2 in row 2.
Within Row 1, the odds of success is Odds1 = π 1 /(1−π 1 ).
Similarly, within Row 2, the odds of success is Odds2 = π 2 /(1−π 2 ).
The ratio of odds from the two rows,
odds1 π 1 / ( 1−π 1)
θ= =
odds2 π 2 / ( 1−π 2)
is called the odds ratio.
Remarks
 If π 1 = 0.75 then Odds1 = 0.75/0.25 = 3, which indicates that we expect to

observe three successes for every one failure.
 Odds are non-negative and values greater than 1 indicates a success is more
likely than a failure.
 When X and Y are independent, conditional distributions of Rows 1 and 2
are same, i.e. π 1=π 2 and this implies, θ = 1.
9|Page
Properties of Odds Ratio (OR)
 Odds ratio can equal any non-negative number.

 When X and Y are independent, conditional distributions of Rows 1 and 2
are same, i.e. π 1=π 2 <=> θ = 1. Thus, when the Odds Ratio = 1 it implies that
the variables are independent.
 If the Odds Ratio > 1 then the odds of success are higher in row 1 than in
row 2.
 If the Odds Ratio < 1 then a success is less likely in row 1 than in row 2.
 The value of θ farther from 1 (too small or too large) in a given direction
indicates stronger level of association.
 If the order of the rows or the order of the columns is reversed (but not
both), the new value of θ is the inverse of the original value.
 This ordering is usually arbitrary, so whether we get θ = 4.0 or 0.25 is
simply a matter of how we label the rows and columns.
 As the odds ratio treats the variables symmetrically, it is unnecessary to
identify one classification as a response variable in order to calculate it. By
contrast, the relative risk requires this, and its value depends on whether we
apply it to the first or second response category.
 When both variables are responses, the odds ratio can be defined using the
joint probability as
π 11 /π 12 π 11 π 22
θ= =
π 21 /π 22 π 12 π 21
which is also called cross-product ratio (odds ratio).
Sample Odds Ratio (OR)
The sample odds ratio is equal to the ratio of the sample odds in the two rows,
p1 / ( 1−p 1) n 11/n12 n11 n 22
^
θ= = =
p2 / ( 1−p 2 ) n 21/n 22 n12 n 21
10 | P a g e
Group Yes No Total
Placebo 189 10845 11034
Aspirin 104 10933 11037
Calculate and interpret the sample odds ratio of MI.
Solution
For the Placebo group, the odds of MI is 0.0174 and for the Aspirin group, the
odds of MI is 0.0095. Therefore, the sample odds ratio is 0.0174/0.0095 = 1.832,
which is interpreted as “The estimated odds are 83% higher for the placebo group”.
Odds Ratio and Tests of Independence

Inference For Odds Ratio
 For small to moderate sample size, the distribution of sample odds ratio θ^ is
highly skewed.
 For θ=1, θ^ cannot be much smaller that θ, but it can be much larger than θ with
nonnegligible probability.
 Consider log odds ratio, log θ
 X and Y are independent implies log θ=0.
Log Odds Ratio
Log odds ratio is symmetric about zero in the sense that reversal of rows or
reversal of columns changes its sign only.
The sample log odds ratio log θ^ has a less skewed distribution and can be
approximated by the normal distribution well.
The asymptotic standard error of log θ^ is given by
ASE¿
A large sample confidence interval for log θ is given by

log θ^ ± z α /2 ASE ( log θ^ )
11 | P a g e
Therefore, a large sample confidence interval for θ is given by
exp [ log θ^ ± z α / 2 ASE ( log θ^ ) ]

Group Yes No Total
Placebo 189 10845 11034
Aspirin 104 10933 11037
Construct a 95% confidence interval for the odds ratio, and interpret.
Solution
The sample odds ratio is 1.832

^
The sample log odds ratio, log θ=log ( 1.832 )=0.2629
The ASE of log θ^ is
√ 1
+
1
+
1
+
1
89 10933 10845 104
=0.123
95% confidence interval for log θ^ is

(0.605 ± 1.96× 0.123)
The corresponding confidence interval for θ is (e0.365, e0.846) or (1.44, 2.33).
Interpretation
Since the confidence interval for θ does not contain 1.0, it means that the true odds
for MI seem different for the two groups. The interval predicts that the odds of MI
are at least 44% higher for subjects taking placebo than for those taking aspirin.
Notes
 The endpoints of the interval are not equally distant from θ^ =1.83, because
the sampling distribution of θ^ is skewed to the right.
 Recall the formula for the sample odds ratio

n11 n22
^
θ=
n12 n21
12 | P a g e
The sample odds ratio is 0 or ∞ if any n ij = 0, and it is undefined if both
entries in a row or column are zero. In such a case, we consider the slightly
modified formula
~ ( n¿¿ 11+ 0.5)( n22+ 0.5)
θ= ¿
(n¿¿ 12+0.5)(n 21+0.5)¿
In the ASE formula also nij’s are replace by nij +0.5
 A sample odds ratio 1.832 does not mean that p1 is 1.832 times p2.
 The following simple relation exists between the odds ratio and the relative
risk
p1 / ( 1−p 1 ) 1− p2
Odds Ratio= =Relative Risk ×
p2 / ( 1−p 2 ) 1− p1
If p1 and p2 are close to 0, the odds ratio and relative risk take similar values.
The Aspirin use table illustrates this similarity. For each group, the sample
proportion of MI cases is close to zero. Thus, the sample odds ratio of 1.83
is similar to the sample relative risk of 1.82 as obtained earlier. In such a
case, an odds ratio 1.83 does mean that p1 is about 1.83 times p2.
 This relationship between odds ratio and relative risk is useful. For some
dataset calculation of the relative risk is not possible, yet one can compute
the odds ratio and use it to approximate the relative risk. Such studies occur
when the sampling design is retrospective, one can construct conditional
distributions for the explanatory variable, within level of the fixed response.
However, it is not possible to estimate the probability of the response
outcome of interest, or to compute the difference of proportions or relative
risk for that outcome. We illustrate this in the following example.
Example: Smoking status and Myocardial Infarction (MI)

Ever Myocardial
Smoker Infarction Control
Yes 172 173
No 90 346
 Odds Ratio=? (3.8222)

 How do we get relative risk? (2.4152). This result would be misleading
because we cannot estimate the difference between nonsmokers and ever
smokers in the probability of suffering MI. However, we can use the odds
ratio since the odds ratio takes the same value when it is defined using
conditional distribution of Y given X, i.e., it treats variables symmetrically.
13 | P a g e
The odds ratio is determined by the conditional distributions in either
direction, and can be computed even if we have a study design that measures
a response on X within each level of Y.
Chi-Squared Tests of Independence

These are used to test whether two variables are related or independent to each
other with respect to some characteristic. They can also be used to check for
distributional adequacy.
To test H0 that the cell probabilities equal certain fixed values { π ij }. We let nij be the
cell frequencies and n be the total sample size, the values μij =nπ ij are the expected
cell frequencies under H0.
Pearson (1900)'s chi-squared test statistic
[ ]
2
I J
( nij −μij )
χ =∑ ∑ .
2
i=1 j =1 μij
Properties of x 2
This statistic takes its minimum value of zero when all nij =μ ij. For a fixed sample
size, greater differences between { n ij } and { μij } produce larger χ 2 values and stronger
evidence against H0. The χ 2 statistic has approximately a chi-squared distribution
with appropriate degrees of freedom for large sample sizes.
Likelihood-Ratio Test
Likelihood ratio
maximum likelihood when parameters satisfy H 0

Λ=
maximumlikelihood when parameters are unrestricted
This Likelihood ratio cannot exceed 1. If the maximized likelihood is much larger
when the parameters are not forced to satisfy H0, then the ratio Λ is far below 1 and
there is strong evidence against H0. Hence, small likelihood ratio implies deviation
from H0.
Likelihood ratio test statistic is -2 log(Λ) , which has a chi-squared distribution

with appropriate degrees of freedom for large samples.
14 | P a g e
For a two-way contingency table, this statistic reduces to
( )
nij
J I
G =2 ∑ ∑ n ij log
2
j=1 i=1 μij
The Pearson χ 2 and likelihood-ratio G2 statistics have the same large sample
distribution under null hypothesis. They commonly yield the same conclusions.
Tests of Independence
In two-way contingency tables, the null hypothesis of statistical independence of

the two responses has the form
H0: π ij =π i+¿ π ¿ for all i and j.
+j
Equivalently, H0: μij =nπ ij=nπ i+¿ π +j ¿
Usually, {π i +¿}¿ and {π + j } are unknown as this expected value. We estimate the
expected frequencies by substituting sample proportions for the unknown
probabilities, leading to
^μij =np ni+¿ n+ j ni+¿n
i+¿ p+ j =n = +j
¿¿ ¿
n n n
The { μ^ ij } are called estimated expected frequencies.
The Pearson’s Chi-square test statistic
[ ]
2
I J
( nij −^μij )
χ 2=∑ ∑ .
i=1 j =1 ^μij
and the Likelihood ratio test statistic
( )
nij
J I
G2=2 ∑ ∑ n ij log
j=1 i=1 ^μij
both of them have large sample chi-squared distribution with (I -1)(J - 1) degrees
of freedom.
Example: Party Identification by Gender

Party identification
Gender ODM ODM-K PNU Total
Male 279 73 225 577
15 | P a g e
Female 165 47 191 403
Total 444 120 416 980
Use the Pearson’s Chi-square and Likelihood ratio to test whether Party
Identification and Gender are associated.
Party Identification by Gender

Party identification
Gende ODM ODM-K PNU Total
r
Male 279 73 225 577
(261.4) (70.7) (244.9)
Female 165 47 191 403
(182.6) (49.3) (171.1)
Total 444 120 416 980
The test statistics are χ 2=7.01 and G2 = 7.00 with degrees of freedom = (I -1)(J - 1)
=(2-1)(3-1)=2.
Each statistic has P-value = 0.03. Therefore, the above test statistics suggest that
Party Identification and Gender are related.
Residuals
A test statistic and its P-value simply describe the evidence against the null
hypothesis. In order to understand better the nature of evidence against H 0, a cell
by cell comparison of observed and estimated frequencies is necessary.
We define, adjusted residuals to be of the form

nij −^μij
r ij =
√ μ^ ij ¿ ¿ ¿
If H0 is true, each r ij has a large sample standard normal distribution. If r ij in a cell
exceeds 2 then it indicates lack of fit of H0 in that cell. The sign also describes the
nature of association.
Remarks
 Pearson's χ 2 tests only indicate the degree of evidence for an association, but
they cannot answer other questions like nature of association and so on.
16 | P a g e
 These χ 2 tests are not always applicable. We need large datasets to apply
n
them. Approximation is often poor when (IJ ) < 5.
 The values of χ 2 or G2 do not depend on the ordering of the rows. Thus we
ignore some information when there is ordinal data.
Testing Independence For Ordinal Data
For ordinal data, it is important to look for types of associations when there is
dependence. It is quite common to assume that as the levels of X increases,
responses on Y tend to increases or responses on Y tends to decrease toward higher
levels of X.
The most simple and common analysis assigns scores to categories and measures
the degree of linear trend or correlation. The method used is known as “Mantel-
Haenszel Chi-Square" test (Mantel and Haenszel 1959).
Linear Trend Alternative to Independence
Suppose u1 ≤ u2 ≤…≤ uI denotes scores for the rows, and let v1 ≤ v2 ≤…≤ vJ denote
scores for the columns. The scores have the same ordering as the category levels.
Define the correlation between X and Y as
I J
r =∑ ∑ ui v j nij −¿¿ ¿ ¿
i=1 j=1
Test For Linear Trend Alternative
Independence between the variables implies that its true value equals zero.
The larger the correlation is in absolute value, the further the data fall from
independence in this linear dimension.
A test statistic is given by M2 = (n - 1)r2. For large samples, it has approximately a

chi-squared distribution with 1 degrees of freedom.
Example: Infant Malformation and Mothers Alcohol Consumption

17 | P a g e
Alcohol Malformation
Consumption Absent (0) Present (1) Total
0 17,066 48 17,114
<1 14,464 38 14,502
1-2 788 5 793
3-5 126 1 127
≥6 37 1 38
i) Test independence of alcohol consumption and malformation using χ 2 or G2.

Interpret, and explain the deficiency of this test for these data
ii) Calculate the adjusted residuals. Do they suggest any association pattern?
iii) Conduct an alternative test that may be more powerful. Interpret results.
Infant Malformation and Mothers Alcohol Consumption

Alcohol Malformation Percent Adjusted
Consumption Absent (0) Present (1) Total Present Residual
0 17,066 48 17,114 0.28 -0.18
<1 14,464 38 14,502 0.26 -0.71
1-2 788 5 793 0.63 1.84
3-5 126 1 127 0.79 1.06
≥6 37 1 38 2.63 2.71
The test statistics are χ 2=12.1 , d.f.=4, p-value = 0.02 and G2 = 6.2, d.f. = 4, P-value
= 0.19. These two tests provide mixed signals.
The percent present and adjusted residuals suggest that there may be a linear trend.
Test For Linear Trend
Lets assign the midpoints of the categories to be scores, v1 = 0; v2 = 1 and

u1 = 0; u2 = 0.5; u3 = 1.5; u4 = 4.0 and u5 = 7.0, the last score being somewhat
arbitrary.
We have, r = 0.014; n = 32,574 and M2 =(n - 1)r2=6.6 with p-value = 0.01. It

suggests strong evidence of a linear trend for infant malformation with alcohol
consumption of mothers.
Remarks
 The correlation r has limited use as a descriptive measure of tables.
18 | P a g e
 Different choices of monotone scores usually give similar results.
 However, it may not happen when the data are very unbalanced, i.e. when
some categories have many more observations than other categories.
 If we had taken (1, 2, 3, 4, 5) as the row scores in our example, then M2 =
1.83 and p-value = 0.18 gives a much weaker conclusion.
 It is usually better to use one's own judgment by selecting scores that reflect
distances between categories.
Exact Inference for Small Samples (Fisher’s Exact Test)
When the sample size is small, one can perform inference using exact distributions
rather than larger sample approximations. This section illustrates such exact
inference for two-way contingency tables.
Example: Fisher’s Tea Tasting Experiment

Poured Guess Poured First
First Milk Tea Total
Milk 3 1 4
Tea 1 3 4
Total 4 4 8
Conduct Fisher’s exact test
We need to test whether she can tell accurately. This is equivalent to test H0:θ=1
against H1 : θ > 1.
We cannot use previously discussed tests as we have a very small sample size.
Fisher's Exact Test
For a 2 x 2 table, under the assumption of independence, i.e. θ = 1, the conditional

distribution of n11 given the row and column totals is hypergeometric.
For given row and column marginal totals, the value for n11 determines the other
three cell counts. Thus, the hypergeometric formula expresses probabilities for the
four cell counts in terms of n11 alone.
When θ = 1, the probability of a particular value n11 for that count equals
19 | P a g e
( n1 +¿¿ n11 ) ( n2 +¿¿ n+1−n11 )
p ( n11 )=
(nn )
+1
To test independence, the p-value is the sum of hypergeometric probabilities for

outcomes at least as favorable to the alternative hypothesis as the observed
outcome.
The outcomes at least as favorable as the observed data is n11 = 3 and 4 given the
row and column totals.
Hence,
p ( 3 )=
( 3 )( 1 ) 16
4 4
= =0.229
( 4) 70
8
p ( 4 )=
( 4 )( 0 ) 1
4 4
= =0.0143
( 4 ) 70
8
Therefore, p-value = P(3) + P(4) = 0.243.

There is not much evidence against the null hypothesis of independence.
The experiment did not establish an association between the actual order of
pouring and the guess.
Three-Way Contingency Tables

Conditional Associations
An important part of research studies involves choice of control variables. In
studying the effect of an explanatory variable X on a response variable Y, one
should “control” covariates that may influence that relationship. Otherwise, an
observed effect of X on Y may reflect effects of covariates on both X and Y.
In this section, the main objective will be analyzing association between two
categorical variables X and Y, while controlling other covariates that can influence
20 | P a g e
the association. Thus, we study the effect of X on Y by fixing such covariates
constant. In other words, study the association between X and Y given the levels of
Z.
The following data is used to examine the effects of racial characteristics on

whether individuals convicted of homicide receive the death penalty.
Example: Death Penalty Data

Victim’s Defendant’s Death Penalty Percent
Race Race Yes No Yes
White White 53 414 11.3
Black 11 37 22.9
Black White 0 16 0.0
Black 4 139 2.8
i) Construct the partial tables needed to study the conditional association

between defendant’s race and the death penalty verdict.
ii) Compute and interpret the sample marginal odds ratio between defendant’s
race and the death penalty verdict.
Partial Tables
These are simply two-way tables between X and Y at separate levels of Z. The two-
way contingency table obtained by combining the partial tables is called the X - Y
marginal table.
Each cell count in the marginal table is a sum of counts from the same cell location
in the partial tables.
The marginal table, rather than controlling Z, ignores it and does not contain any
information about Z.
Example of partial table: Death Penalty Data

Defendant’s Death Penalty
Race Yes No
White 53 430
Black 15 176
The associations in partial tables are called conditional associations, because they
refer to the effect of X on Y. Conditional associations in partial tables can be quite
21 | P a g e
different from associations in marginal tables. In fact, it can be misleading to
analyze only a marginal table of a multi-way contingency table as the following
example illustrates.

Victim’s Defendant’s Death Penalty Percent
Race Race Yes No Yes
White White 53 414 11.3
Black 11 37 22.9
Black White 0 16 0.0
Black 4 139 2.8
Total White 53 430 11.0
Black 15 176 7.9
Simpson's Paradox
This death penalty data is an example of Simpson's paradox. The result that a
marginal association can have different direction from the conditional associations
is called Simpson’s paradox. This result applies to quantitative as well as
categorical variables.
Odds ratios
Consider 2x2xK tables, where K denotes the number of levels of a control variable
Z. Let { n ij } denote the observed frequencies and let { μij } denote their expected
frequencies. Within a fixed level k of Z,
μ11 k μ 22k
θ XY (k) =
μ12 k μ 21k
describes conditional X - Y association. We refer to them as the X - Y conditional

odds ratios.
Marginal Odds Ratio
Expected frequencies in the X - Y marginal table is given by

μ K
ij+¿=∑ μijk ¿
k=1
The X - Y marginal odds ratio is defined as
22 | P a g e
θ XY =μ μ22+¿ ¿
11+¿ ¿
μ12+ ¿μ ¿ ¿
21 +¿
Similar formulas with μijk substituted by nijk ' s provide sample estimates of θ XY (k) and
θ XY .
Sample Odds Ratios

The sample Conditional Odds Ratio is defined by
n 11k n22 k
θ^ XY (k) =
n 12k n21 k
and the sample X - Y Marginal Odds Ratio is
θ^ XY =n n22+ ¿ ¿
11+¿ ¿
n12+¿ n ¿ ¿
21 +¿
Using the example data above, Sample Conditional Odds Ratio for Victim's race:
53 ×37
White, θ^ XY (1)= 414 ×11 =0.43 , which means that the sample odds for white
defendants receiving the death penalty were 43% of the sample odds for black
defendants.
0 ×139
Black, θ^ XY (2)= 16 ×11 =0.0, since the death penalty was never given to white
defendants having black victims.
53 ×176
Sample marginal odds ratio is θ^ XY = 430 ×15 =1.45, which means the sample odds of
the death penalty were 45% higher for white defendants than for a black defendant.
However, we have just observed that those odds were smaller for a white
defendant for a black defendant, within each level of the victim’s race. This
reversal in the association when we control for victim’s race illustrates Simpson’s
paradox.
Marginal verses Conditional Independence
23 | P a g e
We consider the true relationship between X and Y, controlling for Z. If X and Y are
independent in each partial table, then X and Y are said to be conditionally
independent given Z. All conditional odds ratios between X and Y are then equal to
1. Conditional independence of X and Y, given Z, does not imply marginal
independence of X and Y. That is, odds ratios between X and Y equal to 1 at each
level of Z, the marginal odds ratio may differ from 1.
The following table contains the expected frequencies showing a hypothetical

relationship among three variables Y=response (success, failure), X=drug
treatment (A, B), and Z=clinic (1, 2). Compute the conditional association between
X and Y at the two levels of Z,

Response
Clinic Treatment Success Failure
1 A 18 12
B 12 8
2 A 2 8
B 8 32
Total A 20 20
B 20 40
The conditional association between X and Y at the two levels of Z is described by
the odds ratios
18× 8
θ XY (1)= =1.0
12× 12
2 ×32
θ XY (2)= =1.0
8 ×8
These show that given clinic, response and treatment are conditionally
independent.
But, the odds ratio for that marginal table is
20× 40
θ XY = =2.0
20 ×20
which means that the variables are not marginally independent. Why are the odds
of success twice as high for treatment A as treatment B when we ignore clinic?
24 | P a g e
It is misleading to study only the marginal tables, concluding that successes are
more likely with treatment A than with treatment B. Subjects within a particular
clinic are likely to be more homogeneous than the overall sample, and response is
independent of treatment in each clinic.
Homogeneous Association
There is homogeneous X - Y association in a 2 x 2 x K if the conditional odds ratios

between X and Y are identical at all levels of Z.
That is,
θ XY (1)=θ XY (2) =…=θ XY (K )
In such a situation, a single number describes the conditional association.
Conditional independence is a special case of homogeneous association, where

each conditional odds ratio equals 1.0. Homogeneous association is a symmetric
property, applying to any pair of variables viewed across the levels of the third.
When it occurs, there is said to be no interaction between two variables in their
effects on the third variable. If homogeneous association does not exist, the
conditional odds ratio for any pair of variables changes across levels of the third
variable. Next, we show how to check whether sample data are consistent with
homogeneous association or conditional association.
Cochran-Mantel-Haenszel Test
We wish to test whether X and Y are conditionally independent given Z. This is

equivalent to testing
H 0 :θ XY (1)=θ XY (2)=…=θ XY ( K) =1.0
In the kth partial table, the row totals are n1+k, n2+k and column totals are n+1k, n+2k.
Given both these totals, n11k has a hypergeometric distribution and that determines
all other cell counts in the kth partial table.
Under the null hypothesis of independence, the mean and variance of n11k are
25 | P a g e
n 1+k n+1 k
μ11 k =Ε ( n 11k ) =
n++k
n1 +k n 2+k n+1 k n+2 k

Var ( n11 k )= 2
n ++k ( n++k −1 )
The test statistic
[∑ ( ]
K 2
n11 k −μ11 k )
k=1
CMH= K
∑ Var ( n11k )
k=1
is called the Cochran-Mantel-Haenszel (CMH) statistic. It has a large sample chi-

squared distribution with df = 1.
CMH takes larger values when (n11 k −μ11 k) is consistently positive or consistently
negative, rather than positive for some and negative for others. This test is
inappropriate when the association varies widely among the partial tables. It works
best when the X-Y association is similar in each partial table.
Example: Lung Cancer Study, with information relevant to CMH Test

Lung
Cancer
City Smoking Yes No Odds μ11 k Var (n 11k )
Ratio
Beijing Smokers 126 100 2.20 113.0 16.9
=(126+100)
(126+35)
/(126+100+35+61)
Nonsmokers 35 61
Shanghai Smokers 908 688 2.14 773.2 179.3
Nonsmokers 497 807
Shenyang Smokers 913 747 2.18 799.3 149.3
Nonsmokers 336 598
Nanjing Smokers 235 172 2.85 203.5 31.1
Nonsmokers 58 121
Harbin Smokers 402 308 2.32 355.0 57.1
26 | P a g e
Nonsmokers 121 215
Zhengzhou Smokers 182 156 1.59 169.0 28.3
Nonsmokers 72 98
Taiyuan Smokers 60 99 2.37 53.0 9.0
Nonsmokers 11 43
Nanchang Smokers 104 89 2.00 96.5 11.0
Nonsmokers 21 36
In this example
∑ n11k =2930, ∑ μ 11k =2562.5, ∑ var (n ¿¿11 k )=482.1 ¿

k k k
So, CMH = (2930 - 2562.5)2/482.1 = 280.1 with d.f. = 1. Since there is a huge
discrepancy between the value 280.1 and the d.f. this suggests there is extremely
strong evidence against conditional independence.
Estimation of Common Odds Ratio
Now we wish to combine the sample odds ratios from the K partial tables into a
single summary measure of partial association. Assume, homogeneous association,
that is,
H 0 :θ XY (1)=θ XY (2)=…=θ XY ( K)
The Mantel-Haenszel estimator of that common value is given by

K
∑ ( n11 k n22 k /n++k )

θ^ MH = K
k=1
∑ ( n12 k n21 k /n++k )

k=1
The squared standard error for log of MH estimator is:

K
∑ (n 11k +n 22k ¿(n 11k n22 k)¿¿ n++2 k )

σ^ ( log ( θ^ MH ))= k=1
(∑ )
K 2
2 n12k n21 k /n++k

k =1
27 | P a g e
K
+ ∑ [ ( n11 k +n22 k ) ( n12 k n21 k )+(n 12k +n 21k )(n11 k n22k ) ] /n2++k
k=1
(∑ )(∑ )
K K
2 n11 k n22 k /n++k n12 k n21k /n ++ k
k=1 k=1
K
+ ∑ ( n 12k +n 21k ¿(n12 k n21 k )¿¿ n2++ k )
k=1
(∑ )
K 2
2 n12k n21 k / n++k

k=1
Using our example, we find that the MH estimator is θ^ MH = 2.17, the estimated
standard error is
σ^ ( log ( θ^ MH ) )=0.046
A 95% C.I. for common log odds ratio is 0.777 ± (1.96)(0.046) = (0.686, 0.868).
Therefore, a 95% C.I. for common odds ratio is (e0.686, e0.868) = (1.98 2.38).
Note that if the true odds ratios are not identical but do not vary drastically, θ^ MH
still provides a useful summary of the K conditional associations. Similarly, the
CMH test is a powerful summary of evidence against the hypothesis of conditional
independence, as long as the sample associations fall in same direction.
R Output (See resource R code file)

Mantel-Haenszel chi-squared test without continuity correction
data: lungtab
Mantel-Haenszel X-squared = 280.1375, df = 1, p-value < 2.2e-16
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
1.984002 2.383249
sample estimates:
common odds ratio
2.174482
Testing Homogeneity of Odds Ratios
28 | P a g e
Now we test the hypothesis that the odds ratio between X and Y is the same at each
level of Z, (testing for homogeneous association in 2 x 2 x K tables), the null
hypothesis,
H 0 :θ XY (1)=θ XY (2)=…=θ XY ( K)
We use Breslow-Day statistic, which is of the form:

2
I J K
( nijk− μ^ ijk )
∑∑∑ ^μijk
i=1 j=1 k=1
where ^μijk is the expected cell frequency. The formula for computing ^μijk is
complicated.
Under null hypothesis, the Breslow-Day statistic has a large sample chi-squared
distribution with degrees of freedom K - 1. The sample size should be relatively
large in each partial table, with { ^μijk ≥ 5 } in at least about 80% of the cells. It can be
computed using standard statistical software.
GENERALIZED LINEAR MODELS
In Topic 2 we discussed methods for analyzing associations in two-way and three-

way tables. Now we will use models as the basis of such analysis. Models can
handle more complicated situations than discussed so far.
We can also estimate the parameters, which describe the effects in a more
informative way.
29 | P a g e
Example: Challenger O-ring
For the 23 space shuttle flights that occurred before the Challenger mission disaster
in 1986, the following table shows the temperature at the time of flight and
whether at least one primary O-ring suffered thermal distress.
The Data
Ft Temp TD Ft Temp TD Ft Temp TD

1 66 0 9 57 1 17 70 0
2 70 1 10 63 1 18 81 0
3 69 0 11 70 1 19 76 0
4 68 0 12 78 0 20 79 0
5 67 0 13 67 0 21 75 1
6 72 0 14 53 1 22 76 0
7 73 0 15 67 0 23 58 1
8 70 0 16 75 0
We wish to test whether there was any association between temperature and
thermal distress. This will be discussed later!
30 | P a g e
Clearly, the fit from linear regression has a major structural defect, that is,
probabilities fall between 0 and 1, whereas linear functions take values over the
entire real line. Hence, the linear regression predicts probabilities below 0 and
above 1 for sufficiently large or small temperature (x) values.
31 | P a g e
The logistic function seems to fit this data decently.
32 | P a g e
Example: Horseshoe Crabs and Satellites
We shall analyze some data from a study of nesting horseshoe crabs, in which each
female horseshoe crab in the study had a male crab attached to her in her nest. The
study investigated factors that affect whether the female crab had any other males,
called satellites, residing nearby her.
Explanatory variables thought possibly to affect this included the female crab's
color, spine condition, weight, and carapace width. The response outcome for each
female crab is her number of satellites.
For now we consider the width alone as a predictor of the response. To obtain a
clearer picture of overall trend, we grouped the female crabs into a set of width
categories, (≤23.25, 23.25-24.25, 24.25-25.25, 25.25-26.25, 26.25-27.25,27.25-
28.25,28.25-29.25, >29.25), and computed sample mean number of satellites for
female crabs in each category.
Components of A Generalized Linear Model
1. Random component
Identifies the response variable Y and assumes a probability distribution for it
2. Systematic component
Specifies the explanatory variables used as predictors in the model
3. Link
Describes the functional relation between the systematic component and expected
value (mean) of the random component
Random Component
Let Y1, … , YN denote the N observations on the response variables Y. The random
component specifies a probability distribution for Y1,… , YN.
If the potential outcome for each observation Yi are binary such as "success" or
"failure"; or, more generally, each Yi might be number of "successes" out of a
33 | P a g e
certain fixed number of trials, we can assume a binomial distribution for the
random component.
If each response observation is a nonnegative count, such as cell count in a

contingency table, then we may assume a Poisson distribution for the random
component.
If each observation is continuous, we might assume a normal distribution for the

random component.
Systematic Component
Suppose the expected value of Y is denoted by μ= E(Y ). In a GLM, the value of μ

varies according to levels of explanatory variables.
This specifies the variables that play the roles of xj in the formula
α + β 1 x 1+ …+ β k x k
This linear combination of explanatory variables is called the linear predictor.

Some xj may be based on others in the model; for instance, perhaps x3 = x1x2, to
allow interaction between x1 and x2 in their effects on Y , or perhaps x3 = x 21 to allow
a curvilinear effect of x1.
Link
It specifies how the mean μ relates to explanatory variables in the linear predictor.
The model formula states that
g ( μ )=α + β 1 x 1 +…+ β k x k
The function g(.) is called the link function.
Some Popular Link Functions

 Identity Link g ( μ )=μ=α + β1 x1 +…+ β k x k , which models the mean directly.
 Log link g ( μ )=log ⁡( μ)=α + β 1 x 1+ …+ β k x k , which models the log of the mean.
This log function applies to positive numbers such as count data.
34 | P a g e
[ ]
μ
 Logit g ( μ )=log 1−μ =α + β1 x1 +…+ β k x k , which models the log of an odds. A
GLM that uses the logit link is called a logit model.
Each potential probability distribution for the random component has one special
function of the mean called its natural parameter.
 For the normal distribution, it is mean itself.
 For the Poisson, the natural parameter is the log of the mean.
 For the Binomial, the natural parameter is the logit of the success
probability.
The link function that uses the natural parameter as g( μ) in the GLM is called the
canonical link. Though other links are possible, in practice the canonical links are
most common
GLM For Binary Data
The distribution of a binary response is specified by probabilities P(Y = 1) = π of

success and P(Y = 0) = (1- π ¿of failure.
For n independent observations on a binary response with parameter π , the number

of successes has the binomial distribution specified by parameters n and π i.e.
Bin(n, π ).
Linear Probability Model

To model the effect of X, use ordinary linear regression, by which the expected
value of Y is a linear function of X.
The model π ( x ) =α + βx is called a linear probability model. We noted that these

probabilities fall between 0 and 1 but for large of small values of x, the model may
predict π ( x ) <0 or π ( x ) >1
This model is valid only for a finite range of x values.
Example
We shall apply the linear probability model on an epidemiological survey of 2484

subjects to investigate snoring as a possible risk factor for heart disease. The
results are displayed in the following table
35 | P a g e
Snoring Heart Disease Proportion Linear Logit Probit
Yes No Yes Fit Fit Fit
Never 24 1355 0.017 0.017 0.021 0.020
Occasional 35 603 0.055 0.057 0.044 0.046
Nearly every night 21 192 0.099 0.096 0.093 0.095
Every night 30 224 0.118 0.116 0.132 0.131
We use (0,2,4,5) for the snoring categories, treating the last two levels closer than
the other adjacent pairs. See R code for solving this problem in the resource.
Linear Fit using maximizing likelihood: ^π (x) = 0.0172 + 0.0198x. For a nonsnore
(x=0), the estimated proportion of subjects having heart disease is ^π (0) = 0.0172 +
0.0198(0)=0.0172.
36 | P a g e
Note that the linear fit is slightly different.
Logistic Regression Model
Relationship between π (x) and x are usually nonlinear rather than linear. The most
important function describing this nonlinear relationship has the form
log
( 1−ππ (x)(x) )=α + βx , which is called the logistic regression function.
These models are often referred as logit models as the link in this GLM is the logit
link.
The parameter β determines the rate of increase or decrease of the logistic curve.
When β >0 , this is an indication that π ( x ) increases as x increases. When β <0 , π ¿x)
decreases as x increases.
The magnitude of β determines how fast the curve increases or decreases. As |β|
increases, the curve has a steeper rate of change.
Probit Models
The probability of success, π (x), has the form Φ ( α + βx ) where Φ is the cdf of a
standard normal distribution N(0, 1).
The link function is known as probit link: g ( π )=Φ−1 (π )
The probit transform maps π (x) so that the regression curve for π ( x ) ¿ when β <0 has
the appearance of the normal cdf with mean μ=−α /β and standard deviation σ =1/|β|
.
Using the snoring and heart disease data, the probit model with score {0,2,4,5} for
snoring levels, is given by
Probit [ π^ ( x ) ] =−2.061+ 0.188 x
Note that the fitted proportions are obtained from the standard normal tables. For
example, at snoring level x = 0, the fitted probit equals −2.061+0.188 ( 0 ) =−2.06,
which has the corresponding probability of 0.020 from the standard normal table.
37 | P a g e
Also, at snoring level x = 5, the fitted probit equals −2.061+0.188 ( 5 )=−1.12, which
corresponds to a fitted probability of 0.131 as per the standard normal table.
GLM for Count Data

Many discrete response variables have counts as possible outcomes. For instance,
- For a sample of cities worldwide, each observation might be the number of
automobile thefts in 2011.
- For a sample of silicon wafers used in computer chips, each observation
might be the number of imperfections on a wafer.
We have earlier seen the Poisson distribution as a sampling model for counts. Next
we shall model count or rate data for a single response variable using Poisson
regression.
Poisson Regression
The GLM assume a Poisson distribution for the random component. Like counts,
Poisson variates can take any nonnegative integer value. Thus, the Poisson
distribution has a positive mean.
One can model the Poisson mean using the identity link, but it is more common to
model the log of the mean. A Poisson loglinear model is a GLM that assumes a
Poisson distribution for Y and uses the log-link.
Suppose μ denotes the expected value of Y, and let X denote an explanatory

variable. Then the Poisson loglinear model has the form
log ( μ )=α + βx
x
For this model we have the expression μ=exp ( α+ βx )=eα ( e β ) . This shows that a one-
unit increase in X has a multiplicative impact of e β on µ. Thus, the mean of Y at
x+1 equals the mean of Y at x multiplied by e β .
Poisson Regression For Rate Data
It is often relevant to certain types of events which occur over time, space, or some
other index of size, to model the rate at which events occur.
In modeling numbers of auto thefts in 2011 for a sample of cities, we could form a
rate for each city by dividing the number of thefts by the city's population size.
38 | P a g e
The model describes how the rate depends on some other explanatory variables
such as the city’s unemployment rate, median income, and percentage of residents
having completed high school.
When a response count Y has index (such as population size) equal to t, the sample
rate of outcomes is Y/t. The expected value of the rate is μ/t . A loglinear model for
the expected rate has form
log ( μ/t )=α + βx
which has an equivalent representation given as
log μ−log t=α + βx
The adjustment term, -log t, to the above equation is called an offset. Statistical
software can fit models having offsets.
39 | P a g e
MODEL INFERENCE AND MODEL CHECKING
Deviance
This is a measure that summarizes the fit of GLMs. A saturated model has a
separate parameter for each observation giving a perfect fit.
~
Let θ denote the estimate of θ for the saturated model, corresponding to estimated
means ~μi = y ifor all i.
Let θ denote the MLE of θ for the model under consideration.
The deviance of the fitted model is defined as
D ( y;~
μ )=−2 [ L ( ~
μ ; y )−L( y ; y) ]
N N
~ ~
[ ]
¿ 2 ∑ y i θi−b ( θi ) /a ( ∅ ) −2 ∑ [ y i θ^ i ] / a ( ∅ )
i=1 i=1
Deviance
Usually, a (∅) has the form a ( ∅ )=∅ / wi and this statistic equals
N
~ ~ ~ ~
[ ]
2 ∑ w i y i ( θi −θi ) −b ( θi ) + b ( θi ) /∅
i=1
This is called scaled deviance.

The greater the scaled deviance, the poorer the fit.
For some GLMs, the scaled deviance has an approximate chi-squared distribution.
Deviance For Poisson Model

For Poisson GLMs,
θi =log μ^ i and b ( θ^ i )=exp ( θ^ i ) = ^μi
Similarly, for saturated model

θi =log y i and b ( ~
θ i )= y i
Also, a ( ∅ ) = 1 and the deviance is equal to

N
D ( y ; μ^ )=2 ∑ [ y i log ( y i /^μi ) − y i+ μ^ i ]
i=1
When a model with log-link contains an intercept term, the deviance simplifies to
40 | P a g e
N
D ( y ; μ^ )=2 ∑ y i log ( y i / ^μi )
i=1
Deviance For Binomial Model

Consider binomial GLMs with sample proportions { y 1 } based on { n i } trials. Then
θ^ i =log [ π^ i / ( 1−^π i ) ]∧b ( θ^ i ) =log [ 1+exp ( θ^ i ) ] =−log ⁡(1−^π i)
Similarly, for the saturated model,

θ^ i =log [ y i / ( 1− y i ) ] ∧b ( θ^ i ) =−log ⁡(1− y i )
Also, a ( ∅ )=1/ni so ∅=1∧w i=niThe deviance equals

N
2 ∑ ni ¿
i=1
N N
ni y i n −n y
¿ 2 ∑ ni y i log +2 ∑ ( ni −ni yi ) log i i i
i−1 ni π^ i i=1 ni−ni π^ i
At setting i,niyi is the number of successes and (ni - niyi) is the number of failures, i
= 1, …, N: Thus the deviance is D ( y ; μ^ )=2 ∑ ❑Observed X log (Observed/fitted)
Deviance Residuals
Define
~
[
d i=2 w i y i ( θ i−θ^ i ) −b ( θ^ i ) +b ( μ^ i ) ]
The deviance residual for observation i is
√ d i × sign ( y i− ^μi )
Some SAS Codes

data glm;
input snoring disease total;
datalines;
0 24 1379
2 35 638
4 21 213
5 30 254
;
proc genmod; model disease/total = snoring / dist=bin
link=identity;
link=logit;
link=probit;
41 | P a g e
run;
Some R Codes
snoring<-c(0,2,4,5)
disease<-c(24,35,21,30)
total<-c(1379,638,213,254)
glm(cbind(disease, total-disease) snoring,
family=binomial(link="logit"))
glm(cbind(disease, total-disease) snoring,
family=binomial(link=probit"))
Reference: McCullagh, P. and Nelder, J.A.
(1989).Generalized Linear Models. 2nd ed.
London: Chapman and Hall.
42 | P a g e

Statistics and Probability

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics and Probability

Uploaded by

Copyright:

Available Formats

Two-way Contingency Tables Analysis

In this topic 2, we mainly focus on establishing whether an association

Example: Belief In Afterlife Data

The table below is cross-classification of Belief in Afterlife by Gender.

Note that a contingency table that cross-classifies two categorical variables is

Basic concepts, notations and definitions

π + j , which is obtained by the column sums, that is,

Notations For The Data

respectively. The corresponding cell proportions are given by

Example: Notation for Belief In Afterlife Table

Let Y be a response variable and X be an explanatory variable. It is informative to

Example: Sample Conditional Distributions

To establish whether a relationship/association exists between belief in the afterlife

Probability Models For A 2 x 2 Table

The sampling models described in Topic 1 extend to cell counts in contingency

 Poisson Sampling Model

 Binomial Sampling Model

 Multinomial Sampling Model

Comparing Proportions In 2 x 2 Tables

Sample Difference of Proportions

Example: Belief in Afterlife

p1 = n11/n1+ = 435=582 = 0.747

Therefore, the sample difference of proportions is p1 - p2 = 0.747 – 0.737 = 0.010

The estimated standard error (important in computation of confidence interval)

A 95% confidence interval for the true π 1−π 2 is given by

z α / 2=z 0.05 /2=1.96

A large sample 100(1-α)% confidence interval for π 1−π 2 is given by

Example: Aspirin Use and Myocardial Infarction (MI)

i) Construct a 95% confidence interval for the difference of proportions, and

First, we need to compute the proportion in each group as follows

To determine whether regular intake of Aspirin reduces mortality from

Relative Risk (RR)

Sample relative risk = p1 / p 2. Its distribution is heavily skewed and cannot

A large sample confidence interval is given by

Example: Aspirin Use and Myocardial Infarction (MI)

i) Calculate and interpret the sample relative risk of MI.

Odds Ratio (OR)

In 2x2 tables, suppose the probability of “success” is π 1 in row 1 and π 2 in row 2.

Within Row 1, the odds of success is Odds1 = π 1 /(1−π 1 ).

Similarly, within Row 2, the odds of success is Odds2 = π 2 /(1−π 2 ).

The ratio of odds from the two rows,

 If π 1 = 0.75 then Odds1 = 0.75/0.25 = 3, which indicates that we expect to

 Odds ratio can equal any non-negative number.

Sample Odds Ratio (OR)

Calculate and interpret the sample odds ratio of MI.

Odds Ratio and Tests of Independence

Log Odds Ratio

A large sample confidence interval for log θ is given by

Example: Aspirin Use and Myocardial Infarction (MI)

The sample odds ratio is 1.832

95% confidence interval for log θ^ is

The corresponding confidence interval for θ is (e0.365, e0.846) or (1.44, 2.33).

 Recall the formula for the sample odds ratio

Example: Smoking status and Myocardial Infarction (MI)

 Odds Ratio=? (3.8222)

Chi-Squared Tests of Independence

Pearson (1900)'s chi-squared test statistic

maximum likelihood when parameters satisfy H 0

Likelihood ratio test statistic is -2 log(Λ) , which has a chi-squared distribution

j=1 i=1 μij

In two-way contingency tables, the null hypothesis of statistical independence of

Equivalently, H0: μij =nπ ij=nπ i+¿ π +j ¿