Study Designs

Epidemiologic studies
1
Study design
A study design is a specific plan or protocol for conducting

the study, which allows the investigator to translate the
conceptual hypothesis into an operational one.
A framework which guides the researcher in the various
stages of the research process.
2
Types of Epidemiologic design strategies
3
1. Descriptive Epidemiology
• Descriptive epidemiology is a way of organizing data related

to health and health related events by
 person (Who),
 place (Where) and
 time (When) in a population.
4
Time
• Information organized by time easily shows the trend of the
disease over time and establishes the usual occurrence of the
disease in the population which is essential in identifying excess
occurrence (epidemics).
• It can also be used to predict seasonal and secular (long-term)
trends.
5
Cont….
6
Place
• This provides information on geographic distribution of the

disease.
• Such information provides clue in identifying factors
influencing the occurrence of the disease either in the host
or environment.
7
Cont….
Fig 1 HIV prevalence by Administrative Areas - 2011 EDHS

8
Person
 Describing disease occurrence by personal characteristics

 It is important to identify some modifiable factors in order
to prevent or control the disease.
9
Types of descriptive studies
 There are three main types of descriptive studies:
 Case report or case series
 Correlational /ecological
 Cross-sectional
10
A. Case Report
 A case report consists of a careful, detailed report by one

or more clinicians of the profile of a single patient
 More emphasis is given for unusual findings
Potential case reports

Uncommon presentation of a common disease
Typical presentation of a rare disease
A new disease
11
Example
• One case of pulmonary embolism observed 5 weeks after a

woman started using oral contraceptive was the first clue to
the association between oral contraception and increased
risk of venous thromboembolism, an established fact today.
12
B. Case Series
• Describes the experience of a group of patients with similar

diagnosis or health problem derived from either the
practice of one or more health care professionals or a
defined health care setting such as hospital, health centre or
specialized clinic.
13
Example
• The 5 young homosexual men with PCP(pneumocyst carini

pneumonia ) seen between Oct. 1980 and May 1981 in Los
Angeles created a serious concern among physicians since
PCP among young adults is not common.
• Later, with further follow-up and thorough investigation of
the strange occurrence of the cluster of cases the diagnosis
of AIDS was established for the first time.
14
Strength
1. Useful for studying signs and symptoms and creating case

definitions for epidemiological studies.
2. Case-series that include cases at various stages of an illness from
mild cases to dead supplemented by investigation of the past
medical history of these cases and observing them to death (doing
autopsy as appropriate) can help build up a picture of the
natural history of a disease.
3. Very useful in providing critical information, for hypothesis

generation, for sound analytical studies.
15
Limitations
1. Report is based on few patients, which could happen just by

chance.
2. Interpretation of information from case series is severely

limited by the lack of an appropriate comparison group
16
Cont…
3. Detailed and complete risk factor information is difficult to
obtain for all cases from records.
4. Studies are prone to atomistic fallacy the forces that cause
or prevent disease at an individual level are different from
those that work at societal level.
17
Example
• At an individual level a high income may be associated with

lower rate of suicide but this does not mean that societies
which are rich have a lower rate of suicide or better mental
health.
18
C. Correlational or Ecological
Uses data from entire population to compare disease

frequencies - between different groups during the same
period of time, or in the same population at different points in
time.
Does not provide individual data, rather presents average
exposure level in the community.
19
Cont…
 Cause could not be ascertained/established.
 Correlation coefficient (r) is the measure of association
in Correlational studies.
 It is important to note that positive association does
not necessarily imply a valid statistical association.
20
Cont…
The two key features that distinguish a traditional ecologic

study from other types of epidemiologic studies are:
The population unit of analysis
An exposure status that is the property of the
population
21
Examples
 Hypertension rates and average per capita salt consumption
compared between two communities.
 Alemu et.al., Climatic variables and malaria transmission dynamics
in Jimma town. Parasites & Vectors 2011, 4:30
 A study was initiated to see the relationship between
meteorological factors (monthly minimum, maximum and mean
temperature, total rainfall and relative humidity) and malaria case
occurrence over the last decade in Jimma town, Southwest Ethiopia.
22
strength
Can be done quickly and inexpensively, often using available
data.
Limitation
1. Inability to link exposure with disease
 Data on exposure and outcome are not linked at the
individual level; association found with aggregate data may
not apply to individuals (this is referred as ecological
fallacy/Bias).
23
Cont…
• The investigator cannot fill in the cells of a two-by-two

table from the data available in a traditional ecologic study
• For example, in the association between high salt
consumption and hypertension it is difficult to know
whether the risk is higher among individual men/women
who have high intake of salt.
24
Cont…
2. Lack of ability to control for effects of potential
confounding factors.
 There may be other things that are the true cause.
• For example, often people with high salt consumption
also have high meat consumption.
25
D. Cross Sectional Studies
 Cross sectional study investigate disease and risk factor

(exposure) patterns in a representative sample of a
population in a narrowly defined time period.
 An ideal cross-sectional study is done on a geographically
defined population.
26
Cross-sectional Design
factor present
No Disease
factor absent
Study
population
factor present
Disease
factor absent
time
Study only exists at this point in time

27
Cont…
 Populations are commonly selected without regard to

exposure or disease status
Exposure and disease status are assessed simultaneously
Usually current disease status is examined in relation to
current exposure level
It is also possible to examine disease prevalence in relation
to past exposure if the dates of the exposure are ascertained
28
Cont…
 It measures prevalence, not incidence of disease
 Cross-sectional study conducted to estimate prevalence is

called a prevalence study
• Findings from cross-sectional study of a sample population
can be generalized cautiously if basic characteristics of the
populations are similar.
29
Cont…
• Cross sectional studies may be snapshot but often made

over a period of time that extends from a few days to
several years
• It can be useful to identify associations, generate and test
hypothesis and, by repeating at different time periods,
measure change and hence evaluate interventions.
30
Types of cross-sectional studies
1. Single cross-sectional studies
• Determine single proportion or mean in a single population at a
single point in time.
2. Comparative cross-sectional studies
• Determine two proportions or means in two populations at a
single point in time
3. Time series cross-sectional studies
• Determine a single proportion or mean in a single population at
multiple points in time
31
Comparative cross-sectional studies
• A cross sectional study considering a factor of interest between two

population groups
Ex
1. Urban Vs Rural
2. Educated Vs Non educated
3. Public Vs Private service utilization
4. Model Vs Non model households
5. First year Vs Final year students
32
Cont…
• Sample size will be calculated using double population

proportion formula and allocating proportionally
• You can use more than one sampling technique for each group
• In the analysis:
 You can independently fit the model for each groups to see
the effect of factors
 You can fit the model to assess the overall factors of outcome
variable
33
Advantages
One-stop, one-time collection of data

Less expensive & easier to conduct
Provide much information useful for planning health
services and medical programs
Can be used to compare population with different
characteristics as in comparative cross sectional studies.
34
Limitation
1. “Chicken or egg" dilemma
• Difficult to know which occurred first, the determinant or

the outcome.
Ö Therefore, difficult to distinguish whether the exposure

precede the development of the disease or whether presence
of the disease affected the individual's level of exposure .
35
Example
In the study of knowledge of modern contraceptive, and use
of contraception, you may show that women who know
about modern contraception are more likely to use it.
• So you may want to educate women about it, believing that
this will lead to higher rate of use.
• The problem is, did the women know about it and then start
to use it, or did they learn about it because they were using
it?
36
Cont…
• The temporal inference problem can be avoided if an inalterable
characteristic is the focus of the investigation.
2. It may not show strong cause-effect relationships if sample

size is small.
– Usually cross-sectional study shows association, not
causality
37
3. Healthy worker survivor effect
• In studies conducted in occupational settings, because these studies
include only current and not former workers, the results may be
influenced by the selective departure of sick individuals from the
workforce.
 Those who remain employed tend to be healthier than those who
leave employment.
 This phenomenon known as the “healthy worker survivor effect,”
generally attenuates an adverse effect of an exposure
38
Analysis
 Either compare prevalence rate of the outcome in exposed
Vs non-exposed, or
 Compare prevalence rate of the exposure in those with and
without the outcome
 Timing of the subdivision of the study population into
comparison groups distinguishes cross-sectional studies
from other observational analytic studies
39
Cont…
• In cohort and case control studies, this takes place prior to

the data collection process
• In a cross-sectional study, this takes place after the
information has been collected
40
Group Exercise
• How do you conduct a cross-sectional study to assess the

association between malnutrition and school performance
among primary school children in Debre Markos town?
• 5 minutes discussion
41
2. Analytic Studies
• This session gives overview of the purposes of analytic
epidemiology, the common features of analytic studies and
the types of analytic studies.
• This session will enable you to answer the questions: why

are analytic designs needed? How do they work and what
are their types?
42
Purposes of analytic epidemiology
• Focuses on the determinants of a disease by testing the

hypothesis formulated from descriptive studies, with the
ultimate goal of judging whether a particular exposure
causes or prevents disease.
• Thus analytic epidemiology is concerned with the search
for causes and effects, or the why and how
43
We use analytic epidemiology
• To quantify the association between exposures and
outcomes and
• To test hypotheses about causal relationships.
44
 Analytic studies are broadly classified in to two types of
studies to understand causes and effects:
1. Observational studies Both types use
• Case control study "control group", the
use of control group
• Cohort study is the main
• Cross sectional study distinguishing feature
of analytic studies
2. Experimental studies/
interventional
45
2.1. Observational studies
 In an observational study, which is more common, we simply

observe the exposure and outcome status of each study
participant.
 Information are obtained by observation of events.
 No intervention is done, no deliberate interference with
natural course of disease.
46
A. Case-control study design
Definition
• A case-control study is one in which persons with a
condition ("cases") and suitable comparison subjects
("controls") are identified, and then the two groups are
compared with respect to prior exposure.
Subjects are sampled by their outcome status.
47
Overview of design
After identifying cases & controls investigators look backward in time to

assess their exposures
48
Cont…
• The design is capable of evaluating the association of a disease to
exposure many years after the actual exposure.
• Because of this and its efficiency in time and cost case-control studies
have became the most common analytic design.
• Specifically good for studying rare diseases & diseases with very long
latency periods
• It may be possible to explore a wide range of potential exposures to risk
factors for a single disease
49
Selection of cases
• Set clear definition of cases
Depending on the certainty of the diagnosis, and the
amount of information available, it is often useful to
perform analyses separately for cases classified as definite,
probable, or possible.
• Will you include incident or Prevalent cases
• Representing spectrum of disease: mild, moderate and severe
50 groups.
Incident Vs Prevalent cases
Prevalent Cases
• Increase sample size available for rare disease
• More feasible
• When records are available they can be conducted using
secondary data alone.
 Due to their feasibility, they are the more commonly used ones.
BUT
• Difficult to establish temporal sequence between exposure and
outcome – reverse causation.
 E.g. Physical inactivity and CHD
51
Incident cases
•Helpful to establish temporal relationship between exposure
and outcome.
• Records are easily obtainable.
• Recall bias is not a serious problem/minimized.
• However, they are more expensive due expenses for follow-up
to recruit the new cases.
Note: The ideal set of cases would be new (incident)

52
Sources of cases
Hospital- or health care facility
• This approach is referred to as hospital-based case
control study
– Easy and inexpensive to conduct
– Prone to selection bias
53
General population
• Referred as population-based case control study
• Involves locating and obtaining data from all affected
individuals or a random sample from a defined population
• It avoids bias arising from whatever selection factors lead
affected individual to utilize a particular health care
facility or physician
54
Cont…
• Allows the description of the entire picture of the disease in
that population
• Are not routinely used because of the logistic and cost
considerations
55
Selection of controls
Considerations:
• Avoiding selection bias.
• Avoiding information (‘recall’) bias
There is no control group that is optimal for all situations.
56
Sources of controls
A. Hospital Controls
Advantages
• Easily identified and readily available in sufficient number
with reduced cost.
• More likely than healthy individuals to be aware of
antecedent exposures or events (minimize recall bias).
57
Cont…
• Controls are also likely to have been subject to the same

intangible selection factors – minimize selection bias.
• More likely to be cooperative because they anticipate
benefit from their involvement or might think that its
related with their illness --->reduce bias due to non-
response.
58
Disadvantages
• Ill individuals are different from healthy.

• Several studies in the West have demonstrated that
hospitalized patients are more likely to smoke cigarette, use
oral contraceptive, and be heavy drinkers of alcohol than
non hospitalized individuals.
59
Cont…
• Danger of altering the direction of association or masking a
true association between exposure and outcome of interest
 For example, in studying the association of cigarette
smoking and lung Cancer, individuals with other
respiratory illnesses could not be taken as controls, since
smoking is also known to have some association with
other respiratory illnesses.
60
B. General population controls
Advantages:
• Generalizable
• Good when cases are selected to represent affected
individuals in a defined population.
61
Disadvantages:
• Costly and time consuming
• Recall bias - controls may not recall exposures with the
same level of accuracy.
• People might be less motivated to participate for the same
reason given above, which increases non-response rate, i.e.,
selection bias.
62
C. Special controls
• Special controls are individuals which are related to the

cases in some way.
• These are friends, household members (siblings,...),
neighbours,...
63
Advantages
- They are healthy.
- More likely to be cooperative than members of the general
population, because of their interest in the cases.
- Offer a degree of control over some confounding factors,
such as ethnicity, socioeconomic status, or environment
64
Disadvantage/limitations
- If the study factor is likely to be similar to the cases, an
underestimate of the true effect of the exposure of interest
may result.
- E.g. if the study factor is diet, it will be similar for both
cases and controls, if controls are siblings.
65
Case control: numbers and ratio
• Ideally a single control group

• If you think one comparison group is not appropriate –
consider more than one
• Efficiency of a study can be maximized up to control-case
ratio of 4:1.
66
Ascertainment of disease and exposure status
• Any potential source should have the ability to provide

accurate as well as comparable information for all
study groups.
67
Sources of information for disease status
• Review of death certificates, case registries that maintain
ongoing surveillance.
• Office records of physicians
• Hospital admission or discharge records
• Pathology department log books
68
Sources of information about the exposure
• From study subjects themselves, by either interview or mail
questionnaire
• From a surrogate, such as spouses of participants or
mothers of children.
• From records (e.g medical records).
69
Strengths and weaknesses of case-control studies
Advantages
• Is relatively quick and inexpensive compared with other
analytic designs
• Is particularly well suited to the evaluation of diseases
with long latent periods
• Is optimal for the evaluation of rare diseases
• Can examine multiple etiologic factors for a single disease.
70
Disadvantages/limitations
• Inefficient for the evaluation of rare exposures
• Can not directly compute incidence rates of disease in
exposed and non exposed individuals, unless study is
population based
71
Cont…
• In some situations, the temporal relationship between exposure

and disease may be difficult to establish.
• Is particularly prone to bias compared with other analytic

designs, in particular selection and recall bias
72
Small group work
• Suppose you are interested to study whether khat chewing
causes depression.
• How will you conduct case control study?
• Where will you identify cases and controls?
• What are the possible challenges?
73
B. Cohort studies
Learning Objectives
At the end of this session, you will be able to:

• Describe the design of cohort studies
• Describe the limitations of cohort studies
• Identify applications of cohort designs
74
Defi nition
• Cohort studies are a form of longitudinal study designs that

flow from the exposure to outcome.
Synonyms: concurrent, follow-up, incidence, longitudinal,
prospective study
75
Overview of design
• Subjects are selected by exposure, or determinants of
interest from a given population, and followed to see if they
develop the disease or outcome of interest.
• Then, the subsequent development of outcome assessed
and the rate of outcome is compared between the exposed
and non-exposed.
• The direction of inquiry about outcome is always forwards
in time
76
Cont…
• However, the actual data collection can be carried out in either
retrospective or prospective manner as described in the next
section which is about types of cohort studies
• The two groups should be free of the study outcome.
• One of the main functions of cohort study is to provide
information on the incidence and to describe natural history
of disease.
77
Figure 3.: overview of design of cohort studies
Source: LSI Training
78
Example
• Association between cigarette smoking and coronary heart disease
(CHD)
 In this study, people who smoke cigarettes are considered as
exposed and those who do not smoke as non-exposed.
 Both groups are followed for a period of time and compared
with regard to frequency of development of CHD.
 The finding of higher frequency of CHD in smokers as
compared to non-smokers would suggest that smoking is
possibly a cause of CHD.
79
Types of cohort studies
• There are two types of cohort studies, prospective and

retrospective.
• Classification depends on the temporal relationship
between the initiation of the study and the occurrence of
the disease.
• Both designs classify subjects in the study on the basis of
presence or absence of exposure
80
Prospective cohort
• The relevant exposures may or may not have occurred at the
time the study is begun.
• At the beginning of the study the outcome has not yet
occurred.
• Participants must be followed into the future to assess
incidence rates of the disease.
- Regarded as more reliable than the retrospective, if the
sample size is large and follow-up complete.
81  Unless specified, cohort study refers to prospective cohort
Strengths
• Is of particular value when the exposure is rare
• Can examine multiple effects of a single exposure

• Can elucidate temporal relationship between exposure and
disease.
• Allows direct measurement of incidence of disease in the
exposed and non exposed groups.
82
Limitations
• Is inefficient for the evaluation of rare diseases, unless the
attributable-risk percent is high.
• Extremely expensive and time consuming.
• Validity of the results can be seriously affected by loses to
follow-up.
83
Retrospective cohort
- Both exposure and outcome status have occurred at the

beginning of the study.
 This is possible where medical records permit accurate
assessment of both risk factors and disease outcomes, in
which case a retrospective cohort study is possible without
any prospective work.
84
Advantages
• Can be conducted much more quickly and cheaply.
• Is good for diseases with long latency periods

• Often uses of data collected for other purposes.
85
Disadvantages
• Incomplete and possibly non-comparable information

available from records for all study subjects.
• Often information on potential confounding factors is not
available from such records.
86
Selection of exposed group
 Selection of exposed group should consider scientific and
feasibility issues which include:
-The frequency of the exposure of interest in the study
population.
-The need to obtain complete and accurate exposure and
outcome information on all study subjects. Example: the use
of physicians or nurses permits longer and fairly complete
follow up.
87
Cont…
• The nature of particular research question being evaluated

• The ability of obtaining sufficient exposed individuals in a
reasonable period of time - identify high risk population
(special group) to the exposure of interest.
88
Selection of controls
• Always attempt to select a control group which is

comparable to the characteristics of the exposed
population.
• Ensure that the information that can be obtained from the
non-exposed group is adequate for comparison with the
exposed population.
89
Exposure ascertainment
1.Using Pre-existing records: from hospital, employers record..

 In some circumstances this may be the only way to obtain such
data accurately
Advantages
• Can make available information for high proportion of cohort.
• Relatively inexpensive to obtain.

• Allow objective and unbiased classification of exposure status
90
Disadvantages
- Information on exposure level may be insufficient.
- Such records frequently do not contain data on potential
confounding variables
91
2) Information supplied by the study subjects themselves
• Particularly useful for collecting information on exposures
that are not routinely recorded.
• A potential for bias always exists in the use of such data
since it cannot be obtained as objectively as from
preexisting records.
• Stigma associated with certain exposures may influence a
respondent’s answer
Disadvantages
92 • Potential for information bias, particularly recall.
3)Direct physical examination or testing
• For some exposures or characteristics direct physical
examination and/or blood testing may be necessary
• Provide an objective and unbiased means of classifying
study subjects with respect to exposure.
93
Outcome ascertainment
• With adequate consideration to the resources available for
the study, the aim is to obtain complete, comparable and
unbiased information on the subsequent health experience
of every study subject.
94
 One or a combination of the following sources could be
used:
1. Death certificate
2. Hospital records
3. Directly from the study participants
-For those who report an event of interest, additional
information such as hospital records can be obtained to
confirm the diagnosis
95
Cont…
4. Periodic direct medical examinations

 Allows collection of objective information
5. Autopsy records
96
Follow-up
• This is the major challenge in cohort studies, as well as the

major cost in terms of time.
• Unless complete or nearly complete information could be
obtained the results might be un-interpretable.
97
Cont…
• If the loss to follow-up is not comparable between the two

exposed groups, this will also be a source for bias.
• Therefore, if there is a need for long follow up period, the
mechanism to achieve complete follow-up should be
thought carefully in the planning of the study.
98
Analysis
The basic analysis in cohort studies are:
- Calculation and comparison of rates of the incidence of

the outcome for exposed and non-exposed.
- Comparison of the two groups with baseline
characteristic to ensure similarity.
99
Small group work
• Suppose you are interested to study whether khat chewing

causes depression.
• How will you conduct the different types of cohort studies?
• Where will you identify exposed and non-exposed
individuals?
• What are the possible challenges?
100
Measure of association
101
Objectives
At the end of this session, you will be able to:
 Define the measures of association
 Compute and interpret values of measures of association

Explain applications of the measures of association
102
Introduction
• Descriptive epidemiologic study designs help in generating

hypothesis about determinants of heath.
• Those hypotheses are tested using analytic designs.
• Analytic studies identify determinants through assessing their

association with the outcome.
• The association is indicated using measures of association.
103
Why Estimate Comparisons?
• Overall rate of disease in an exposed group says nothing
about whether exposure is a risk factor for or causes a
disease.
• This can only be evaluated by comparing disease
occurrence in an exposed group to another group that is
usually not exposed.
• The latter group is usually called the comparison or
reference group.
104
Association
• Statistical relationship between exposure and disease.
• An association is said to exist between two variables when a
change in one variable parallels or coincides with a change in
another variable.
• Requires comparing two groups:
 Exposed Vs Unexposed
 Cases Vs non cases/controls.
105
Cont….
• Variables can be related or unrelated to one another.
• If they have relation, it can be:

 Positively or negatively
 Strongly or weakly (one variable can have large or small effect

on the other)
 Significantly or not significantly related
• Statistically significant association is that the association is not

likely due to chance.
106
Cont…
• It is dependent on the strength of the association and

sample size
• Association is not causation!
• An association is said to be causal when it is proved a
change in independent (exposure) variable produces a
change in the dependent (outcome) variable.
107
Cont…
• Commonly, the strength of the association is measured by

the
 Relative Risk (RR)
 Odds Ratio (OR)
108
Relative Risk [RR]
• Risk: The probability of an event occurring overtime

• Risk Ratio: The ratio of the risk of disease incidence in
exposed group compared to the risk in those unexposed.
• We place the group that we are primarily interested in the
numerator; we place the group we are comparing them
with in the denominator
109
Cont…..
• It estimates the magnitude (size) of an association between

exposure and disease.
• It indicates the chance of developing the disease in the
exposed group relative to the non exposed group for a
factor.
• It is usually used in cohort and experimental studies
110
Table 1: a 2 by 2 table indicating findings of a cohort study
111
From the above table the RR is calculated as:
112
Example-1
Table 2: Data from a cohort study of oral contraceptive (OC) use and
bacteriuria among women aged 15-49 years
113
Calculate RR?
Interpretation: women who used oral contraceptive had 1.4

times higher risk of developing bacteriuria when compared to
non-users.
114
Interpretation
• RR=1
– Risk in exposed = risk in non-exposed
– No association
• RR>1
– Risk in exposed > risk in non-exposed
– Implies that exposed individuals are x times highly likely to develop

the outcome as compared to non-exposed.
– Positive association, factor is associated with disease
– Larger RR  stronger association

115
Cont…
• RR<1
– Risk in exposed < risk in non-exposed
– Implies that exposed individuals have (1-x)100% lower

probability of developing the outcome than the non-
exposed
– Negative association, factor is “protective”
116
Guideline for strength of association
 1.0 = No association
 1.1-1.3 = Weak
 1.4-1.7 = Mild
 1.8-3.0 = Moderate
 3.0-8.0 =Strong
Q. What if RR is less than 1?
117
Cont….
• For inverse associations (RR is less than 1.0), take the

reciprocal and look in above table, e.g., reciprocal of
0.5 is 2.0, which corresponds to a “moderate”
association
• The further RR away from 1, the stronger the
association between exposure and disease.
118
Exercise
• 2000 women aged over 65 years were enrolled in a study for 10

years. The investigators divided women into two groups: 1000
women who took regular exercise (exposed) and 1000 women
who did not take regular exercise (unexposed). The investigators
recorded 800 new cases of osteoporosis, 300 in those who took
regular exercise and 500 in those who did not.
• What is the appropriate measure of association for this scenario?
119
Odds Ratio (OR)
• Odds: The probability of an event's occurring to the probability

of its not occurring.
Odds = P/1-P
Where
 p = the probability of an event
 1-p = the probability that the event does not occur
• Indicates the likelihood of having been exposed among cases
relative to controls.
120
Odds Ratio: The ratio of two odds or the ratio of the odds of
exposure in cases compared with the odds of the exposure in
controls.
• We can calculate either exposure or disease odds ratio, which

are exactly the same
121
Table 5: Indicating findings of case-control studies
a = number of persons exposed and with disease

b = number of persons exposed but without disease
c = number of persons unexposed but with disease
d = number of persons unexposed: and without disease
122
Cont…
123
Cont...
• The odds ratio is sometimes called the cross-product ratio,

because the numerator is the product of cell a and cell d,
while the denominator is the product of cell b and cell c.
• A line from cell a to cell d (for the numerator) and another
from cell b to cell c (for the denominator) creates an x or
cross on the two-by-two table.
124
Example-1: Data from a case-control study of current oral
contraceptive (OC) use and MI in pre-menopausal female
nurses (Table 3)
125
Interpretation: the odds of having MI is 1.6 times higher
among OCP users compared to that of the non OCP users
126
Exercise
• A sample of 263 students who bought lunch at a school

cafeteria were asked whether or not they developed
gastroenteritis. From 225 who ate sandwich 109 developed
gastroenteritis and 4 students who didn’t eat developed
gastroenteritis.
1. Construct 2x2 table
2. Compute OR
127
Interpretation of OR
Odds ratio = 1
• The odds of disease in the exposed and non exposed is the same, No
association.
Odds ratio > 1
• The odds of disease is greater in the exposed
• Implies that the odds of exposure is x times higher among cases than
controls
• The factor may be a risk factor.
128
Cont…
Odds ratio < 1

• The odds disease in the exposed is less than the odds of
disease in the non exposed.
• Implies that the odds of exposure is by (1-x)100% lower in
cases than controls.
129
Measure of impact
• Comparing disease occurrence among the exposed with the
disease occurrence among the unexposed comparison
group by subtracting one from the other.
• It is Absolute Comparisons
130
Gives information on:
– The absolute effect of exposure on disease occurrence
– The excess disease risk, or disease burden, in the
exposed group compared to the unexposed group
– The public health impact of an exposure, that is, how
much disease would be prevented if the exposure were
removed
131
Cont…
• Health impact of determinants is assessed using different

types of measures:
 Attributable risk
 Attributable risk percent

 Population attributable risk and
 Population attributable risk percent.
132
A. Attributable Risk(AR)
• More precisely called prevalence difference, cumulative incidence
difference, and incidence rate difference
• It is also known as risk difference or excess risk among exposed groups,
rate difference, attributable rate.
Quantifies the excess risk in the exposed that can be attributable to
the exposure.
The number of cases among the exposed that could be eliminated if

the exposure were removed.
• Note: “attributable” implies causality
133
How to Calculate?
• Attributable risk is defined as the difference between the

incidence rates (or cumulative incidence) in the exposed and
non-exposed.
• For example, in a cohort study
AR = Incidence in exposed - Incidence in unexposed a/( a + b) –

c/( c+ d)
AR = Ie – Io or AR = CIe – Cio
134
Concept of Attributable risk
 Incidence in the exposed group
= Incidence not due to the exposure (background

exposure) + incidence due to the exposure
• Incidence in the non exposed group
= Incidence not due to the exposure (background

exposure)
135
Fig 1.Pictorial illustration of AR calculation
136
Example: 1
Consider the hypothetical cohort study conducted to assess
association between malaria during pregnancy and low birth weight.
Let‘s calculate AR from findings of the study indicated in the table
below.
137
Cont…
• AR= [50/100] - [100/900] = 0.39 or 39%
• The value of AR implies that 39% low birth weight

deliveries that occur among women who had malaria
during pregnancy are attributable to malaria.
138
B. Attributable Risk Percent (AR%)
• Attributable proportion, etiologic Fraction
• What proportion of cases is attributed to the actual exposure

among exposed people?
• It is an estimate of the proportion of the disease in the exposed
group that could be prevented by eliminating the exposure
139
Fig 2. Pictorial illustration of how to compute AR%
140
Example-1
141
Preventive Fraction
 Exposures associated negatively with outcome variable have a

relative risk below the unity or 1.
 If relative risk is < 1, the exposure is protective
 When exposure is preventive (AR is less than 0) then the
analogous figure to the AR is;
142
Example: vaccine efficacy
143
C. Population Attributable Risk (PAR)
• Excess risk of disease in total population attributable to

exposure.
• The AR quantifies the excess risk in the “exposed” group.
• The PAR estimates the excess rate of disease in the “total”
study population of exposed and non-exposed that is
attributable to the exposure.
144
Cont…
145
How to Calculate?
• PAR = It – Io or
• PAR = CIt – CIo
Where
• It= incidence rate in the total population
• Io =incidence rate in the non exposed
Alternatively, the PAR can be calculated as:

PAR = (AR)*Pe
• Where AR is the attributable risk and Pe is the proportion of exposed people in
the population
146
Population Attribuable risk fraction (PAR%)
What proportion of cases is attributed to the actual exposure

among the general population?
• Estimate the proportion of disease in the study population that is
attributable to the exposure and thus could be eliminated if the
exposure were eliminated.
• Expressed as a percentage of total risk in population
147
Example -2: Fast driving and Automobile Deaths
PAR = 0.018-0.01 = 0.008

PAR% = 0.018-0.01 x 100 = 44%
0.018)
Conclusion: 44% of driving-related deaths in population were
presumably due to fast driving
148
Association and cause effect /Evaluation
of Evidences
149
Learning objectives
After the end of this session, students will be expected to:

o Discuss the difference between association and causation
o Identify the role of chance, confounding factors and bias in
establishing cause-effect relationship
o Control the role of chance, confounding factors and bias
o Apply Bradford Hill Criteria to establish cause-effect
relationship between exposure and outcome of interest
150
Brain storming
If exposure X is associated with outcome Y…..then how do we

decide if X is a cause of Y or not?
151
Judgment of causality
o Epidemiology is “…. and determinants of diseases and other health

related problems in human population and the application …..”.
o One of the major purposes of epidemiological studies is
discovering the causes of a disease.
o Judge whether the association of an exposure and a disease is
causal or not
152
Alternative explanations for the observed
association other than cause-effect relationships:
o The association may be the result of chance

o The association may be the result of bias
o The association may be the result of a confounding effect
o The cause can be both a cause and effect (reciprocal
causation)
E.g: Vitamin “A” VS diarrhea
153
To show a Valid Statistical Association
Rule out the following as alternative explanation:
– Chance (random error) : How likely is it that what we
found is a true finding
– Bias: Whether systematic error has been built into the
study design
– Confounding: Whether an extraneous factor is related to
both the disease and the exposure
154
Chance
To control/minimize chance:
1. Designing phase
o Increase the sample size (increase the power of the
study)
2. Analysis phase
o Hypothesis testing, P-value’s role
o Confidence interval determination
155
Role of confounding factors
o Confounding refers to the mixing of the effect of an
extraneous variable with the effect of the exposure and
disease of interest
o Can overestimate /underestimate the true association
156
Criteria for Confounding Factors
o Must be an independent predictor of disease with or
without exposure
o Must be associated (correlated) with exposure
o Must not be an intermediate link in a causal pathway

between exposure and outcome
157
Confounding
Alcohol Lung Cancer
Smoking
158
Confounding
/mediator?
Diet CHD
Cholesterol
On the causal pathway
159
Evaluating & controlling of confounding
Minimize introduction of confounding effect

 During design phase
o Randomization
o Restriction sampling
o Matching
 During analysis phase
o Stratification analysis/pooled analysis
o Multivariate analysis
160
Role of confounding factors…
o Distinguish confounding from effect modification (interaction)
• By stratification analysis
o Distinguish confounding from mediation
• By careful consideration of causal pathways (knowledge of

biological and pathological pathway)
161
Confounding
Imagine you have stratified your dataset for smoking status in the alcohol
- lung cancer association study. Would the odds ratios differ in the two
strata?
The alcohol association would yield the similar odds ratio in both strata
and would be close to unity. In confounding, the stratum-specific odds
ratios should be similar and different from the crude odds ratio by at
least 15%. Stratification is one way of identifying confounding at the time
of analysis.
If the stratum-specific odds ratios are different, then this is not

confounding but effect modification.
162
Confounding or Effect Modification
Birth Weight Leukaemia OR = 1.5
Sex
Does birth weight association differ in strength according to sex?
BOYS Birth Weight Leukaemia OR = 1.8
GIRLS Birth Weight // Leukaemia OR = 0.9
163
Read more on
• Stratified analysis
• Matched analysis
164
Role of Bias
 It is any systematic error in an epidemiologic study that
results in an incorrect estimate of the association between
exposure and outcome.
 Bias may result the design, collection, recording, analysis or
interpretation of data
 It describes an error arising from the design, execution or
analysis, interpretation and dissemination phase
o Two most common sources of bias are selection bias and
information bias
165
Role of Bias…
o We should minimize introduction of Bias by:
– Choose appropriate study design
– Use strict and randomized sampling procedure

– Follow the activities in a protocol, strictly
– Choose and stick to standardized questionnaire/ ascertaining

instrument
– Train and blind your study participants
166
Establishing a Causal Association
Observed association Could it be due

to Bias?
No
Could it be due to
Confounding?
No
Could it be due to
Chance?
No
Could it be
Cause
Apply Judgment of causality

167
 Once we found that chance, bias and confounding are all
determined to be unlikely, then we can conclude that a valid
statistical association exists.
 Judge whether an association of an exposure and a
disease is causal or not .
 We should then apply criteria judgment of causality
168
Process of establishing causation
 Develop Hypothesis 1. Developing a hypothesis
a testable, unproved  From descriptive studies

assumption, based on collected – Suggest possible
data. determinants
 Testing Hypothesis , assess 2. Testing the hypothesis
presence of association  Using analytic studies
 Use criteria to establish  Assess presence of
association, Bradford hills association
criteria
3. Use criteria for establishing causation
 How to separate causal from non-causal associations in
epidemiology?
• Setting criteria
 Offered by Bradford Hill
– Guides, not rules

– Strength, consistency, specificity, temporality, biologic
gradient, plausibility, coherence, experimental evidence
and study design
170
Bradford-Hill criteria
• It is the statement of epidemiological criteria of a causal
association formulated in 1965 by Austin Bradford Hill
(1897-1991)
This criteria include;
1. Strength of the Association;
The stronger the association the more likely that it is a
causal.
Strong --- The more it is far from unity.
--- If RR/OR > 1.5 and < 0.5
Weak If RR/OR > 0.5 and < 1.5
171
Cont…
2. Consistency of relationship;
The same association should be demonstrated by

other studies both with different methods, settings and
different investigators.
Special methods of combining of a number of well
designed studies exist, Meta Analysis.
172
3. Specificity of the association
o If a particular exposure increases the risk of a certain disease, but
does not increase the risk of other diseases, this may be taken as
evidence in favour of a cause–effect relationship (Germ theory)
Single Exposure Single Outcome
Plasmodium Malaria
HIV AIDS
173
4. Temporality
o Temporality refers to the necessity that the cause precede the onset of
the disease (effect) in time
o This is usually problematic in cross-sectional and case-control studies
Exposure Disease
o It is generally easier to establish a temporal relationship in prospective

studies than in retrospective studies where measurements of the
exposure and outcome are made at the same time
174
Temporality…
o Any claim of causation must involve the cause preceding in
time the presumed effect
o Easier to establish in certain study designs
– Prospective cohort and experimental studies
Normal Cancer
lung
Exposure TIME Outcome

175
5. Biologic gradient (dose-response relationship)
o This assumes that the more intense the exposure, the greater the risk
of disease development
o This intensity can be measured by dose, or by duration of exposure
o The risk of disease increases with increasing exposure to a causal agent
o Duration (frequency) of exposure to risk factor
o However, there may be threshold level to cause disease
E.g. Cigarette smoking dose response
176
Dose-response relationship…
 Dose-response (‘biological gradient’)
– the relationship between the amount of exposure (dose) to a substance
and the resulting changes in outcome (response)
 If an increase in the level of exposure increases the risk of the
outcome
– this strengthens the argument for causality
R
R I
R I S
I K
R
S
S
I
S
K
K K
0 cigs/day < 5 cigs/day 5 - 20 cigs/day > 20 cigs/day
177
Cont…
6. Biological Plausibility:
 Hypothesis should be coherent with what is known about
the disease; both biologically and using laboratory.
 Knowledge about physiology, biology and pathology
should support the cause-effect relationship
178
Cont…
7. Study design;
It is most important to consider.
179
Cont…
8. Reversibility/Experimental evidence
Removal of a possible cause results in a reduced disease
risk
eg. Cessation of cigarette smocking is associated with

reduction in risk of Lung cancer relative to those who
continue.
If the cause leads to rapid irreversible changes (as in HIV

infection), then reversibility cannot be a condition for
180
causality.
Cont…
Judging the evidence
There are no completely reliable criteria for determining whether
an association is causal or not.
In judging the different aspects of causation,
The correct temporal relationship is essential,
• Once this has been found, weight should be given to
– Plausibility,
– Consistency, and
– dose-response relationship
181
THANK YOU!
182

Study Designs

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Study Designs

Uploaded by

Copyright:

Available Formats

Epidemiologic studies

A study design is a specific plan or protocol for conducting

• Descriptive epidemiology is a way of organizing data related

• This provides information on geographic distribution of the

Fig 1 HIV prevalence by Administrative Areas - 2011 EDHS

 Describing disease occurrence by personal characteristics

 There are three main types of descriptive studies:

 Case report or case series

 A case report consists of a careful, detailed report by one

Potential case reports

• One case of pulmonary embolism observed 5 weeks after a

• Describes the experience of a group of patients with similar

• The 5 young homosexual men with PCP(pneumocyst carini

1. Useful for studying signs and symptoms and creating case

3. Very useful in providing critical information, for hypothesis

1. Report is based on few patients, which could happen just by

2. Interpretation of information from case series is severely

• At an individual level a high income may be associated with

Uses data from entire population to compare disease

The two key features that distinguish a traditional ecologic

• The investigator cannot fill in the cells of a two-by-two

 Cross sectional study investigate disease and risk factor

Study only exists at this point in time

 Populations are commonly selected without regard to

 It measures prevalence, not incidence of disease

 Cross-sectional study conducted to estimate prevalence is

• Cross sectional studies may be snapshot but often made

• A cross sectional study considering a factor of interest between two

• Sample size will be calculated using double population

One-stop, one-time collection of data

• Difficult to know which occurred first, the determinant or

Ö Therefore, difficult to distinguish whether the exposure

of the disease affected the individual's level of exposure .

2. It may not show strong cause-effect relationships if sample

• In cohort and case control studies, this takes place prior to

• How do you conduct a cross-sectional study to assess the

• This session will enable you to answer the questions: why

• Focuses on the determinants of a disease by testing the

 In an observational study, which is more common, we simply

After identifying cases & controls investigators look backward in time to

• Will you include incident or Prevalent cases

• Representing spectrum of disease: mild, moderate and severe

• Records are easily obtainable.

• Recall bias is not a serious problem/minimized.

• However, they are more expensive due expenses for follow-up

to recruit the new cases.

Note: The ideal set of cases would be new (incident)

Hospital- or health care facility

• This approach is referred to as hospital-based case

– Easy and inexpensive to conduct

– Prone to selection bias

• Controls are also likely to have been subject to the same

• Ill individuals are different from healthy.

• Special controls are individuals which are related to the

• Ideally a single control group

• Any potential source should have the ability to provide

• In some situations, the temporal relationship between exposure

• Is particularly prone to bias compared with other analytic

• Where will you identify cases and controls?

• What are the possible challenges?

At the end of this session, you will be able to:

• Cohort studies are a form of longitudinal study designs that