Professional Documents
Culture Documents
BloodTypesCancerCaseStudy
BloodTypesCancerCaseStudy
and cancer
Dr Alberto Corrias
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 1 / 42
PollEv LUCKY DRAW
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 2 / 42
PollEv game: Overall standings
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 3 / 42
ABO blood types
†
Diagram is by InvictaHOG is in the Public Domain
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 4 / 42
ABO blood types
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 5 / 42
ABO blood types by country
†
Image credit: Rick Wicklin (SAS)
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 6 / 42
ABO blood types and cancer: a long history
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 7 / 42
Our case study
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 8 / 42
Data collection (1996-2005)
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 9 / 42
Our objective
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 10 / 42
Building the ”expected” table (if H0 was true)
Expected incidence
Blood
Healthy Pancreatic cancer Total
type
O 46,329-128.04=46,200.96 128.04 46,329
A 38,785-107.20=38,677.8 107.20 38,785
B 8,497-23.49=8,473.51 23.49 8,497
AB 14,208-39.27=14,168.73 39.27 14,208
Total 107,521 298 107,819
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 12 / 42
About the ”expected” table
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 13 / 42
Building the ”expected” table: practical tip
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 14 / 42
Building the χ2 statistic
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 15 / 42
About the χ2 statistic
Imagine we collect the following data
Observed incidence
Type Healthy Cancer Total
O 100 50 150
A 100 50 150
B 100 50 150
AB 100 50 150
Total 400 200 600
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 16 / 42
Finding the χ2crit
If we choose α = 0.05
to identify χ2crit = 7.8.
Note number of dof =
(4-1)(2-1)=3
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 17 / 42
Performing the χ2 test
0.2
†
Exact p value can be computed using any statistical software
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 18 / 42
Rejection of the NULL hypothesis: statistical meaning
Statistical meaning
If, in fact, the the distribution of blood types among pancreatic
cancer patients is the same as for healthy individuals, then the
probability to randomly draw samples of the population that led us to
the computed χ2 value is less than 5% (the exact value of the
probability is the p value).
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 19 / 42
Rejection of the NULL hypothesis: how it is reported
†
Dandona et al. JNCI: 102(2):135-137. 2010
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 20 / 42
Of rows and columns...
Consider the following contingency tables
Table 1
X Y Table 2
Type 1 160 100 Type 1 Type 2 Type 3 Type 4
Type 2 240 50 X 160 240 1000 580
Type 3 1000 185 Y 100 50 185 98
Type 4 580 98
2 < 84.4
3 = 84.4
4 It is impossible to tell
Answer: 84.4! Transposing the contingency table does not change the
result.
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 21 / 42
Pairwise comparisons
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 22 / 42
Pairwise comparisons: Type O versus Type A
Original Data
Type Healthy Cancer Total Type O Vs Type A
O 46,238 91 46,329 Type Healthy Cancer Total
A 38,667 118 38,785 O 46,238 91 46,329
B 8,464 33 8,497 A 38,667 118 38,785
AB 14,152 56 14,208 Total 84,905 209 85,114
Total 107,521 298 107,819
The expected table for the Type O versus Type A case is calculated
(as before) by applying the formula
Trow Tcol
Erow ,col =
Ttotal
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 23 / 42
Type O versus Type A: building the expected table
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 24 / 42
Type O versus Type A: computing the χ2 statistic
This is now a 2X2 case and we need to apply the Yates correction
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 25 / 42
All pairwise comparisons
†
See ANOVA lecture: every pairwise test carries the chance of committing type
I error.
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 26 / 42
Determining χ2crit−BONF (e.g., for a case of 95% confidence)
0.2
This green area to the right
0.1 of χ2crit−BONF is 0.05/6
χ2 (1)
0
χ2crit χ2crit−BONF
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 27 / 42
Performing 6 pairwise χ2 tests
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 28 / 42
Statistical statements
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 29 / 42
Examining the rejection cases
Observed incidence
Type Healthy Cancer Total
O 46,238 91 46,329 We observe two aspects
A 38,667 118 38,785
(from previous slide)
B 8,464 33 8,497
AB 14,152 56 14,208
Comparisons with O type
Total 107,521 298 107,829 yielded statistically significant
differences
Expected incidence
Type Healthy Cancer Total
For type O, observed is less
O 46,200.95 128.05 46,329 than the expected while for all
A 38,677.80 107.20 38,785 other types observed is more
B 8,473.52 23.48 8,497 than the expected.
AB 14168.73 39.27 14,208
Total 107,521 298 107,829
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 30 / 42
How it is reported
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 31 / 42
A similar case study: cancer and ethnicity
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 32 / 42
Calculation question (3 points)
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 33 / 42
Calculation question: solution
Healthy Cancer Total
White 2500 195 2,695
Black 3500 220 3720
Asian American 540 20 560
Total 6,540 435 6,975
Computing the expected table:
2695 ∗ 6540 3720 ∗ 6540
E1,1 = = 2526.92, E1,2 = = 3488...
6975 6975
Expected
Healthy Cancer Total
White 2,526.92 168.08 2,695
Black 3488 232 3720
Asian American 525.08 34.92 560
Total 6,540 435 6,975
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 34 / 42
Calculation question: solution (cont’d)
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 35 / 42
Calculation question: solution (not required by the
question)
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 36 / 42
Calculation question: solution (cont’d)
Computation of the χ2 statistic should have been made with the
Yates correction. For example,
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 38 / 42
Testing against a known distribution
Our case study so far
Blood type Healthy Pancreatic cancer Total
O 46,238 91 46,329
A 38,667 118 38,785
B 8,464 33 8,497
AB 14,152 56 14,208
Total 107,521 298 107,819
Testing against known distribution
National Pancreatic cancer
Blood type
Average (%) patients
O 43% 91
A 36% 118
B 7.9% 33
AB 13.1% 56
Total 100% 298
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 39 / 42
Testing against a known distribution: building the χ2
statistic
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 40 / 42
Testing against a known distribution
†
One can also proceed doing pairwise comparisons as we did earlier in this
class: we will not cover it for this case though.
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 41 / 42
Lecture Summary
BN2102 Bioengineering Data Analysis Analysis of nominal data case study: blood type and cancer 42 / 42