Professional Documents
Culture Documents
Week 01
Week 01
Week 01
ISYE6421
• TA: TBD
My academic pathway
Agenda
• Course organization
• Introduction to Biostatistics
4
Organization of the Course
Topics:
• Biostatistical Design of Medical Studies
• Software: R (or SAS, BUGS, MATLAB)
• Basic statistical inference
• Categorical Data: OR, RR, Mantel-Haenszel
• Continuous Data: Parametric/Nonparametric
• Review of Linear and Logistic Regression
• Multiple Comparisons
• False Discovery Rate
• Survival analysis: censoring, Kaplan-Meier, Cox.
• Applications (Genetics,-omics, CHIP-Seq)
• Longitudinal data analysis
• Sample size calculation for studies
• Brief Introduction to Causal Inference 5
Course Organization (cont.)
Grading:
• Homework: 20%
• Introduction to Biostatistics
7
What is Statistics?
Examples:
• Parents of a child with a genetic defect consider
whether or not they should have another child
• To choose the best therapy, a physician must
compare the prognosis, or future course, of a
patient under several therapies
• Does smoking cause cancer?
Key elements: uncertainty, variation, inference.
Some Definitions of Statistics:
• “May be regarded as mathematics applied to
observational data….as the study of (1) population; (2)
variation; (iii) methods of the reduction of data.” (Fisher)
• “= Uncertainty and Behavior.” (Savage)
• “the art of learning from data.” (Ross)
8
Why Biostatistics?
9
Goal of this course
Some Biostatistical Problems:
1. Is a new drug more effective in treating an illness than a
previously used drug?
2. Does the use of a seat belt decrease the chance of death
in a car accident?
3. How does the frequency of laboratory tests influence the
quality of medical care?
4. Are growth curves for boys and girls significnatly
different?
10
Lecture 1
11
Types of Studies: I
• Observational Study: collects data from an
existing situation. The data collection does not
intentionally interface with the running of the
system
(Remark: the act of observation may introduce change
into a system)
• Experiment: a study in which an investigator
deliberately sets one or more factors to a
specific level
(Remark: experiments lead to stronger scientific
inferences than do observational studies, which are
always open to misinterpretation due to a lack of
knowledge in a given field)
12
Three types of Experiments
• Laboratory experiment: an experiment that takes
place in an environment (called a laboratory) where
experimental manipulation is facilitated
13
Types of Studies: II
14
Types of Studies: III
15
Cohort Study / Prospective Study
16
Cohort Study / Prospective Study
17
Cohort Study: Example
• Important: in Cohort study, information about the risk
factor (exposure to disease) is determined prior to the
observation of disease status.
• Can conduct a prospective study from existing data ---
historical prospective
18
Cohort Studies: Advantage
Advantage:
• Strongest observational design for establishing
cause and effects relationship
• Very efficient for study of rare exposure
• Clear temporal relationship between exposure
and disease
• can yield information on multiple exposures, or
on multiple outcomes of a particular exposure
• May yield information on incidence of disease
19
Cohort Studies: disadvantage
Disadvantage:
• Time consuming
• Often require a large sample size
• Expensive
• Not efficient for the study of rare disease
• Losses to follow-up may diminish validity
• Change over time in diagnostic methods may
lead to biased results
20
Case-Control / Retrospective Studies
21
Case-Control : Example
22
Case-Control Studies: Advantage
Advantage:
• Efficient for the study of rare disease
• Efficient for the study of chronic disease
• Tend to require a smaller sample size than
cohort studies
• Less expensive than cohort studies
• May be completed more rapidly than cohort
studies
23
Case-Control Studies: Disadvantage
Disadvantage:
• Risk of disease cannot be estimated directly
• Not efficient for the study of rare exposures
• More susceptible to selection bias than cohort
studies
• Information on exposure may be less accurate
than that available in cohort studies
24
Cohort vs. Case-Control
Exposure Disease
+ -
+ n11 n12
Rows fixed in
Cohort
- n21 n22
26
Steps necessary to perform a study
27
Ethical Issues
28
• Data collection: Design of forms (what data are to
be collected, clarify of questions, pretesting of forms
and pilot studies, layout and appearance).
• Data editing and verification (validity check,
consistency check, missing forms)
• Data handling
• Amount of Data collected: sample size
• Inference from a study
29
Summary
30
Topic: 2x2 Table
• Technical Notation:
➢ Relative Risk (RR): can be estimated in
cohort studies, but not in case-control
studies
➢ Odds Ratio (OR): can be estimated in both
cohort and case-control studies
31
Recall: Cohort vs. Case-Control
Exposure Disease
+ -
+ n11 n12
- n21 n22
32
Comparing Two Proportions
33
Example: ABO Hemolytic
34
Example
ABO Hemolytic Disease
Total
Yes No
This is the
exposure Black infant 43 3541 3584
White infant 17 3814 3831
Four Methods:
• Small sample: Fisher’s exact test
• Large sample: Three tests
35
I. Fisher’s exact test
Disease + Disease -
Reject H0 at 5% level
Accept the null hypothesis ie there is a racial difference in disease
rates - ie true odds ratio is not equal to 1 ie odds are different for 37
each race
II. Large Sample Test A
Disease + Disease -
Disease + Disease -
Under H0
40
Large Sample Test B: Example
42
IV: Chi-Square Test of independence
43
Chi-Square Test of independence: Example
data: data1
X-squared = 12.2615, df = 1, p-value =
0.0004624
45
Summary: Tests of Two ind. Bin RV
46
Measures of Effects for Bin RV
47
1. Risk Difference
• Let
p1 = probability of developing disease
for exposed individuals;
p2 = probability of developing disease
for unexposed individuals
• Risk Difference = p1 – p2
49
Relative risk
50
3. Odds Ratio
odds = prob of success / prob of failure
Odds ratio = odds of exposed group / odds of unexposed group
and is estimated by
51
Example (Continued)
ABO Hemolytic Disease
Total
Yes No
Black infant 43 3541 3584
White infant 17 3814 3831
52
Hypothetical Case-Control Study
Disease + Disease -
Sample Exposed + a b
Exposed - c d
Disease + Disease -
Population Exposed + A B
Exposed - C D
This is a case control so we finalize the end outcome ie column sum is fixed and rows proportion can vary as we trace back the cases
53
Sample RR and Population RR
Disease + Disease -
Sample Exposed + a b
Exposed - c d
Disease + Disease -
Population Exposed + A B
Exposed - C D
a =f1 A, c = f1 C, b = f2 B, d = f2 D
probability of people who have disease has this exposure
54
Sample RR and Population RR
Disease + Disease -
Sample Exposed + a b
Exposed - c d
Disease + Disease -
Population Exposed + A B
Exposed - C D
a =f1 A, c = f1 C, b = f2 B, d = f2 D
Relative risk
estimation for a
case control study 55
Hypothetical Case-Control Study
56
Sample OR and Population OR
Disease + Disease -
Sample Exposed + a b
Exposed - c d
Disease + Disease -
Population Exposed + A B
Exposed - C D
a =f1 A, c = f1 C, b = f2 B, d = f2 D
57
Sample OR and Population OR
Disease + Disease -
Sample Exposed + a b
Exposed - c d
Disease + Disease -
Population Exposed + A B
Exposed - C D
a =f1 A, c = f1 C, b = f2 B, d = f2 D
58
Hypothetical Case-Control Study: OR
59
Estimation of OR
Disease + Disease -
Sample Exposed + a b
Exposed - c d
60
Summary: RR and OR
Disease + Disease - Total
62
Example: Smoking-Perinatal Mortality
63
Smoking-Perinatal Mortality: OR
64
Smoking-Perinatal Mortality: Tests
65
Smoking-Perinatal Mortality: CI
66
Summary
67