Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Principles of Study Design in

Environmental Epidemiology

Hal Morgenstern y Duncan Thomas


Environmental Health Perspectives Supplements
1993
Principles of Study Design in Environmental
Epidemiology
Hal Morgenstern" and Duncan Thomas2
'Department of Epidemiology, University of California at Los Angeles, School of Public Health, Los Angeles, CA
90024-1772; 2Department of Preventive Medicine, University of Southern California, School of Medicine, Los
Angeles, CA 90033-9987
This paper discusses the principles of study design and related methodologic issues in environmental epidemiology. Emphasis is given to studies
aimed at evaluating causal hypotheses regarding exposures to suspected health hazards. Following background sections on the quantitative objec-
tives and methods of population-based research, we present the major types of observational designs used in environmental epidemiology: first, the
three basic designs involving the individual as the unit of analysis (i.e., cohort, cross-sectional, and case-control studies) and a brief discussion of
genetic studies for assessing gene-environment interactions; second, various ecologic designs involving the group or region as the unit of analysis.
Ecologic designs are given special emphasis in this paper because of our lack of resources or inability to accurately measure environmental expo-
sures in large numbers of individuals. The paper concludes with a section highlighting current design issues in environmental epidemiology and sev-
eral recommendations for future work. - Environ Health Perspect 101 (Suppl 4):23-38 (1993).
Key Words: Study design, epidemiologic methods, environmental healts, ecologic studies, aggregate studies, causal inference

Introduction
the effect of a given exposure on disease tion. For example, a rate of 0.001/year
The purpose ofthis artide is to discus the princi- occurrence in a particular population. means that we would expect one new case to
ples of study design and related methodologic Measures of disease frequency involve occur for every 1000 person-years of follow-
issues in environmental epidemiology. The focus the occurrence of new cases or deaths (inci- up (e.g., 100 disease-free people followed for
is on studies aimed at evaluating csal hypothe- dence/mortality) or the presence of existing an average of 10 years).
ses ding eosu to spected health haz- cases (prevalence). In both applications, Although there are many quantitative
ards. Because the intended audience for this the number of cases is expressed relative to methods for expressing the magnitude of a
document indudes scientists without formal the size of the population from which the statistical association between two variables
training in epidemiology, parts of this article cases are identified. With incidence mea- (e.g., exposure status and disease occur-
highlight basic principles of epidemiologic sures, this denominator is the (base) popu- rence), we are usually interested in a special
reerc Nevertheless, we also try to summarze lation at risk (i.e., individuals who are class of such measures that reflect the net
comprehensively the current state of the art and eligible to become cases). Thus, the base effect of the exposure on disease occurrence
make recommendations for future developments population of a study (or study base) is the (i.e., causal parameters). In general, a
in study design. For more tensive trment of group of all individuals who, if they devel- causal parameter for a target population is a
general research principles and methods in epi- oped the disease, would become cases in hypothetical contrast-in the form of a dif-
demiology, the interested reader should consult the study (3,11,12). ference or ratio-between what the fre-
available textbooks in this area (1-6). More Disease incidence, which is central to the quency of disease would be if everyone
detailed exmples of applications in environmen- process of causal inference, can be expressed were exposed (at a given level) to what the
tl epidemiology may be found in severl other as a cumulative measure (risk) or as a per- frequency would be if everyone were unex-
books, such as those edited by Leaverton (7), son-time measure (rate). The cumulative posed (often called the reference level)
Chiazze et al. (8), Goldsmith (9), and Kopfler incidence (incidence proportion) or average (15). When this difference for a specific
and Craun (10). risk in a base population is the probability of exposure is not zero (the ratio is not one),
someone in that population developing the we say that the exposure is a risk factor for
Population Parameters disease during a specified period, condi- that disease in the target population. In
The major quantitative objectives of most epi- tional on not dying first from another dis- practice, we estimate causal parameters indi-
demiologicstudies are to estimate two types of ease (13). The term cumulative incidence rectly by comparing disease frequency for an
population parameters: the frequency of dis- or cumulative incidence rate also is defined exposed group with disease frequency for an
ease occurrence in particular populations and somewhat differently as the integral over the unexposed group. Epidemiologists typically
follow-up period of the hazard (rate) func- estimate the risk or rate ratio (often called
This manuscript was prepared as part of the Envir- tion (14). The incidence rate or instanta- the relative risk) by comparing the exposed
onmental Epidemiology Planning Project of the Heafth neous risk (hazard) is the limit of the average population with an unexposed population.
Effects Insttue, September 1990 - September 1992. risk for a given period, per unit of time, as The key assumption of this statistical
*Author to whom correspondence should be
addressed. the duration of the period approaches zero. approach is that the risk or rate observed
This work was funded by the Health Effects The average rate (incidence density) for a for the unexposed group is the same (within
Institute in Cambridge, MA. The authors would like to given period is estimated as the number of confounder strata) as the risk or rate that
thank Dr. John Tukey, Dr. Sander Greenland, and other
members of the HEI Methodology Working Group for incident events divided by the amount of would have been observed in the exposed
their helpful comments. person-time experienced by the base popula- group if that group had not been exposed

Environmental Health Perspectives Supplements 23


Volume 101, Supplement 4, December 1993
MORGENSTERN AND THOMAS

(16). Thus, the (true) risk ratio may be ology is to measure accurately each individ- example, can depend on how subjects are
interpreted as a causal parameter, which is the ual's exposure to hypothesized risk factors selected.
number of cases actually occurring in the (i.e., the biologically relevant dose [ Thomas Selection Bias. Selection bias means
exposed (target) population divided by the and Hatch, this issue]). This task is made that the way in which subjects are selected
number of cases that would have occurred in very difficult by the lack of information into the study population or into the analy-
the absence of exposure. about environmental sources of emission, sis (due to lost subjects or missing data)
Certain measures of association, such as the complex pattern of most long-term distorts the effect estimate. In general, this
correlation coefficients and standardized exposures, the individual's ignorance of pre- problem occurs when either disease status
regression coefficients, do not, in general, vious opportunities for exposure, the lack of or exposure status influences the selection
reflect any causal parameters. The reason is good biological indicators of exposure level, of subjects to a different extent in the
that the magnitude of these measures depends and the lack of sufficient resources to collect groups being compared. Selection bias is
in part on the relative variances of the expo- individual exposure data on large popula- most likely to be problematic when the investi-
sure and disease variables, which are influ- tions. The consequences of exposure mis- gator does not identify the base population
enced by the sampling strategy (i.e., noncausal measurement are probable bias in the from which study cases arose.
parameters) (17,18). Another measure of estimation of effect (see "Sources of Information Bias. Information bias
association, the odds ratio, is used in certain Epidemiologic Bias") and possible loss of pre- means that the nature or quality of measure-
types of epidemiologic studies (case-control cision and power with which effects are esti- ment or data collection distorts the effect
designs) to estimate the risk or rate ratio indi- mated and tested (23,24). The problem and estimate. The primary source of informa-
rectly when we cannot first estimate the inci- issues of exposure measurement are discussed tion bias is error in measuring one or more
dence rate or risk in the exposed and unexposed more thoroughly by Hatch and Thomas in variables. When exposure status or disease
populations (1-619,20). this issue. status is misclassified, bias usually occurs. If
Rare Diseases, Low-Level Exposures, the probabilities of misclassification of each
Problems in Environmental and Small Effects. In most epidemiologic variable are the same for each category of the
Epidemiology studies of environmental hazards, statistical other variable (nondifferential misclassifica-
There are several general problems in envi- objectives may be further compromised by tion) and if the errors for different variables
ronmental epidemiology that tend to limit the infrequent occurrence of the disease or are independent, the estimate of effect is
causal inference and, therefore, shape outcome of interest, by the low prevalence usually biased toward the null value (indi-
design decisions. or levels of environmental exposures in the cating no effect). Possible exceptions to this
Long Latent Periods. The interval general population, and by the search for principle of nondifferential misclassification
between first exposure to an environmental small effects (for which the true rate ratio is leading to conservative effect estimates arise
risk factor (or the start of causal action of this between 0.5 and 2). A critical consequence when the misclassified exposure variable is
factor) and disease detection (or symptom of these features is usually substantial loss categorized into more than two groups
onset) may be many years or even decades. of precision and power with which effects (25). In other situations involving differ-
Such long latent periods are pardy due to lim- are estimated and tested. In addition, it ential misclassification (unequal misclassifi-
itations ofmedical technology and incomplete becomes more difficult for the investigator cation probabilities) or correlated
surveillance for detecting disease; yet they are to separate the effect of the exposure of measurement errors, the effect estimate
also due to a prolonged induction period in interest from the distorting effects of extra- may be biased in either direction. In many
which years are needed for the disease process neous factors. Causal inference can then studies, therefore, the magnitude of mis-
to begin (5). The term latent period also is be seriously compromised. classification bias is difficult to predict,
used more specifically to indicate the hypo- especially when other biases are operating.
thetical interval between disease initiation and Research Objectives and Confounding. Confounding refers to a
detection (5). Refer also to Armenian and Strategies lack of comparability between exposure
Lilienfeld (21) who discuss altemative defini- Given the above problems, epidemiologists groups (e.g., exposed versus unexposed)
tions of latency. Unfortunately, long latent must carefilly plan their studies, analyze their such that disease risk would be different
periods produce important practical con- data, and interpret their findings. Inaccurate even if the exposure were absent or the
straints on our ability to estimate exposure results reflect both random errors of estima- same in both populations (16). Thus,
effects. The investigator must either observe tion (chance) and systematic errors or bias. confounding is epidemiologic bias in the
subjects for many years or rely on retrospec- An epidemiologically unbiased or valid esti- estimation of a causal parameter (see
tive (historical) measurement of key variables. mate of a causal parameter is one that is "Population Parameters"). Because there is
The latter altemative may be infeasible for cer- expected to represent perfectly (aside from no empirical method for directly observing
tain types of exposures or in certain popula- chance) the true value of the parameter in the the presence or magnitude of confounding,
tions. Even when feasible, however, base population. in practice we attempt to identify and con-
retrospective measurement usually increases trol for manifestations of confounding.
the amount of error with which exposures are Source of Epidemiologic Bias This is done by searching for differences
measured (see below). Furthermore, the level A common framework for describing the between exposure groups in the distribu-
of most environmental exposures and many validity of epidemiologic research is to con- tion of extraneous risk factors for the dis-
extraneous risk factors changes appreciably or sider three sources of bias in the estimation ease, which are called confounders. Thus,
unpredictably over time; long latent periods, of effect: selection bias, information bias, a confounder is a risk factor (or proxy) that
therefore, seriously complicate our ability to and confounding (2). Despite the practi- is associated with exposure status in the
estimate effects (22). cal attractiveness of this framework, the base population. A covariate meeting these
Errors of Exposure Measurement. A three types of bias are not entirely separate criteria is not a confounder, however, if its
major challenge in environmental epidemi- concepts. The amount of confounding, for association with the exposure is due entirely to

24 Environmental Health Perspectives Supplements


Volume 101, Supplement 4, December 1993
STUDY DESIGN

the effect of the exposure on the covariate; vational studies might also involve the evaluation treatment groups are comparable with
for example, the covariate might be an of an intervention that was not implemented or respect to inherent risk. This does not imply
intermediate variable in the causal pathway controlled by the investgators. Social scientists that there can be no confounding in a com-
between the exposure and disease. If the often use the term quasi-experiment to mean parison of randomly assigned groups. Even
exposure and covariate are time-dependent any type ofnonrandomized study (27). with perfect adherence to treatment assign-
variables, it is possible for that covariate to Experiments. In a simple experiment, ments and no loss to follow-up, assigned
be both a confounder and an intermediate there are usually two treatment groups. groups might have, by chance, different
variable (see "Cohort Study"). One group is assigned to receive the new hypothetical risks in the absence of treat-
experimental intervention and the other ment. Nevertheless, such confounding, if it
The Need for Covariate Data (control) group is assigned to receive no exists, is equally likely to be positive or nega-
In addition to the exposure of interest, intervention, a sham intervention (placebo), tive; conventional confidence-interval esti-
there is the need in virtually all epidemio- or another available intervention. Simple mates and p values reflect the possibility of
logic studies to collect data on other known randomization of individuals to treatment this bias, which becomes smaller as the
or possible risk factors for the disease. groups implies that all possible allocation (effective) sample size increases (28). This
These covariates may be relevant to the schemes of assigned subjects are equally protection against confounding afforded by
exposure effect in three ways: a) as con- likely (28). Following randomization, the randomization, however, does not apply to
founders, b) as intermediate variables, and investigator follows subjects for subsequent lack of adherence or loss to follow-up, both
c) as effect modifiers. disease occurrence or change in outcome sta- of which usually do not occur randomly.
The effects of confounders must be tus. A comparison of risks between treatment Furthermore, if some subjects cross over
controlled or removed analytically to groups provides an estimate of a causal between treatments (e.g., residents of a fluo-
obtain unbiased estimates of causal para- parameter reflecting the treatment effect. ridated district obtain their water from non-
meters. This control is usually achieved Because experiments are best suited eth- fluoridated districts), a comparison of
through stratification or model fitting. ically and practically to the study of health assigned groups will underestimate the true
The assessment and control of intermediate benefits, not hazards, experiments in envi- treatment effect even when the crossover is ran-
variables can elucidate causal mechanisms ronmental epidemiology would usually be dom (32). A comparison of compliers with
that explain exposure effects (26). This limited to the study of preventive interven- noncompliers, on the other hand, is essentially
approach often leads to new etiologic tions. Furthermore, it is generally impossi- observational and therefore prone to bias.
hypotheses and new intervention strategies ble or infeasible to randomize subjects Qwsi-ExpeHments. A quasi-experiment
for disease prevention. individually. The only practical alternative, may be done similarly to an experiment by
When the exposure-effect measure therefore, is to randomize by group, where comparing two or more nonrandomized
varies across categories or levels of another the group might be a city, school, work site, groups, or it may be done by comparing one or
factor, we call the second factor an effect etc. (29). The major limitation of group more groups over time, before versus after the
modifier; this statistical phenomenon is randomization is some within-group depen- intervention is initiated in at least one
called effect modification or an interaction dence (correlation) of the outcome variable, group. With the latter approach, the com-
effect. The assessment of effect modifica- which reduces precision and power (30,31). position of each group may change over
tion is model-dependent, meaning that it Thus, the effective sample size falls between time so that subjects observed before the
depends on what (causal) parameter is used the number of randomized groups and the intervention are not the same subjects
to measure the effect (2-6). For example, total number of subjects (see Prentice and observed after the intervention.
an extraneous risk factor that does not Thomas, this issue). Returning to the fluoride hypothesis, a
modify the risk ratio for the exposure will As an example, consider the hypothesis quasi-experiment was done in the 1940s
modify the risk difference. The assessment that the intake of fluoride ions in drinking and 1950s by comparing two similar,
of effect modification is important for water has a protective effect on the occur- nearby cities in New York State, both of
properly specifying the predictors in statis- rence of dental caries in children. An which lacked fluoride treatment before
tical models (2,14), for making inferences experiment might be conducted by ran- 1945. Newburgh started sodium fluoride
about possible biological (causal) interac- domly assigning many water districts (each treatment in 1945 and continued through-
tions between exposures (e.g., synergy) (5), with one fluoride-deficient water supply out the 10-yr postintervention follow-up
and for generalizing one's results to other without treatment) either to implement period; Kingston continued to use its fluo-
populations (see "Cohort Study"). sodium fluoride treatment under the con- ride-deficient water without treatment
trol of the investigators or to continue its (33). The investigators found that the rate
Types of Rmech current policy of no treatment for the dura- of decayed, missing, or filled (DMF) teeth
There are three general design strategies for con- tion of follow-up. Assuming the hypothe- in children, ages 6 to 12, decreased by
ductng population research: a) experiments in sis were true, we would expect the subsequent almost 50% in Newburgh but increased
which the investigators randomly assign (ran- incidence rate of dental caries to be lower slightly in Kingston.
domize) subjects to two or more treatment in the treated districts than in the untreated Because subjects were not individually
(exposure) groups; b) quasi-experiments in districts. randomized in this study, it is possible that
which the investigators make the assignments Randomization insures a valid compari- children in the treated group differed from
to treatment groups nonrandomly; and c) son of subjects according to intended treat- children in the comparison group with
observational studies in which the investiga- ment, i.e., assigned treatment, but not respect to other risk factors for tooth decay,
tors simply observe exposure (treatment) sta- according to treatment actually received such as diet. Thus, the investigators' com-
tus in subjects without assignment (2). (16;28). That is, randomization of a suffi- parisons might have been confounded.
Although some epidemiologists dassify the cient number of units (subjects or groups) Note, however, that randomization by city
first two types as intervention studies, obser- provides some assurance that the assigned would not have reduced this possible bias

Environmental Health Perspectives Supplements 25


Volume 101, Supplement 4, December 1993
MORGENSTERN AND THOMAS

in the Newburgh-Kingston study, because cases at one time. Second, the sampling strat- risk factors for the disease. The assessment
the two assigned treatment groups would egy involves complete selection of the entire of passive smoking would involve measur-
be equally noncomparable regardless of population from which study cases are identi- ing exposure at home, work, and elsewhere
which city was assigned fluoride treatment. fied, or it involves incomplete or case-control with an attempt to quantify the number of
Observational Studies. Unlike experi- sampling of a fraction (<100%) of the non- smokers, cigarettes, and/or exposure time
ments and quasi-experiments, observational cases in the population from which study cases for each woman by trimester. Then each
studies are commonly used to estimate the are identified. Case-control sampling, there- neonate would be followed by periodic
effects of exposures hypothesized to be harm- fore, implies stratification on disease status in examinations and parental reports of symp-
ful, fixed attributes (e.g., race and genotype), the selection process. Combining these two toms to his or her third birthday. By estab-
characteristics, behaviors or exposures over dimensions results in four basic designs: longi- lishing a standard set of criteria for diagnosing
which the investigator has little or no control tudinal studies of a complete population new cases of lower respiratory disease and
(e.g., weight, depression, and sunlight expo- (cohort studies); cross-sectional studies of a by categorizing the passive-smoke exposure
sure), and other exposures for which manip- complete population (cross-sectional studies); into two or more categories, we can com-
ulation or randomization would be unethical longitudinal studies with case-control sam- pare the 3-year risk of disease by exposure
or infeasible. Observational studies are often pling (case-control studies with incident cases); group. In this hypothetical example, the
conducted with secondary or retrospective and cross-sectional studies with case-control experience of each subject contributes to a
data (instead of primary prospective data) sampling (case-control studies with prevalent single exposure group. Since subjects are
and/or without following individual subjects cases). In addition to these basic designs, we not randomized to exposure groups, it is
for change in disease status. For example, also discuss new developments in genetic stud- important to control analytically for other
the fluoride hypothesis could be tested by ies for assessing gene-environment mteractions risk factors that are associated with expo-
comparing the prevalence of decayed, miss- (see "Genetic Studies"). sure status in the study (base) population.
ing, or filled teeth in children who live in For example, we might want to control for
areas supplied by fluoridated water with the Cohort Study the child's exposure to passive smoke at
corresponding prevalence in children who A cohort or follow-up study is a longitudi- home; if other family members smoked
live in areas supplied by nonfluoridated nal design of a specified population in during the mother's pregnancy, they are
water. Although such a study would be less which exposure status is measured for all also likely to have smoked during the
expensive and easier to conduct than would subjects at the start of follow-up (baseline) child's first 3 years of life. On the other
the previous examples, there are additional and possibly during follow-up. The entire hand, we should probably not control for
methodologic problems that could lead to study population-typically persons who birth weight even if it is a risk factor for the
bias or misinterpretations. are free of the index disease at baseline- disease, because prenatal smoking affects birth
The remainder of this artide is devoted are followed for detection of all incident weight. Thus, provided low birth weight is a
to an elaboration of observational study cases or deaths of interest. Thus, the base risk factor for lower respiratory disease during
designs. In "Basic Observational Designs," population in this design is identical to the the first three years of life, low birth weight is
we cover the basic designs in which data on study population. likely to be an intermediate variable in the
disease status, exposure status, and all Cohort studies may be entirely prospec- causal pathway between prenatal exposure to
covariates are collected at the individual tive, meaning exposure status and disease passive smoke and the disease.
level; that is, the unit of analysis is the indi- occurrence are ascertained for the period Strengths of a Cohort Design. The
vidual (or body part, such as the tooth or during which the study is conducted, or prospective cohort study is the observa-
eye). In "Ecological Designs," we cover they may be entirely retrospective (histori- tional design that is most similar to an
designs in which the unit of analysis is a cal), meaning exposure status and disease experiment. The major strengths of this
group of individuals, such that information occurrence are ascertained for a period design derive from the fact that disease
is missing on the joint distributions of key before the study begins. Retrospective data occurs and is detected after subjects are
variables at the individual level. are usually obtained from the subject's selected and after exposure status is mea-
recall of past events or from abstracted sured. Thus, we can usually be confident
Basic Observational Designs records. Many cohort studies combine that the exposure preceded the disease (i.e.,
Frequently, hypotheses about environmen- both data-collection procedures; e.g., the there is no temporal ambiguity). This fea-
tal risk factors for disease are derived from follow-up period for detecting the disease ture is particularly important when disease
animal studies, dinical observations, reports starts before the study and continues can also influence exposure status (e.g.,
of disease clusters, descriptive findings from throughout the study period. Although persons with asthma moving to drier, less-
population surveillance systems, and various retrospective studies are generally much less polluted areas). Well-designed retrospec-
types of exploratory studies (e.g., case series, expensive and time-consuming, prospective tive cohort studies also lack temporal
mapping studies, and migrant studies). studies can be designed to collect more ambiguity of cause and effect.
Formal testing of these hypotheses most appropriate, complete, and accurate data. Another major strength of the cohort
often proceeds by conducting observational Example. Suppose we want to estimate design is the usual lack of selection bias
studies of the types described in this section. the possible effect of prenatal exposure to that threatens other basic designs (2).
Basic designs in epidemiology may be clas- passive smoke (not maternal smoking) on Disease status cannot, in principle, influ-
sified according to two dimensions: type of the risk of lower respiratory disease during ence the selection of subjects except, per-
study population and type of sampling scheme the first 3 years of life. We might identify haps, in poorly designed retrospective
(34). First, the study population is longitudi- a large group of nonsmoking pregnant cohort studies. Sometimes researchers,
nal, involving the detection of incident events women and interview them just before ignoring this principle, propose random
during a follow-up period; or it is cross-sec- delivery about their exposure to passive sampling to reduce bias. In fact, random
tional, involving the detection of prevalent smoke during pregnancy and about other sampling in a cohort study, unlike random

26 Environmental Health Perspectives Supplements


Volume 101, Supplement 4, December 1993
STUDY DESIGN

assignment, does not prevent or necessarily At best, baseline similarities between lost and affected by previous exposure status, that risk
reduce epidemiologic bias in effect estima- followed subjects only suggest that loss to fol- factor can be a confounder and an intermedi-
tion; i.e., random sampling generally does low-up is probably not a major threat to ate variable simultaneously (36,37). For
not improve comparability between expo- validity, especially ifthe attrition rate is low. example, suppose we want to estimate the
sure groups. It does, however, make the Perhaps the major practical limitation of effect of exposure duration on mortality from
study population representative of a larger a cohort design, especially prospective stud- a specific disease. If early symptoms of the
,well-defined source population (sampling ies, is its inefficiency for studying rare out- disease lead to termination of exposure, then
frame), which may make one's findings come events, which is what most diseases are early symptoms, which is a risk factor for dis-
more generalizable. For example, suppose in nonclinical populations. Because expo- ease mortality, is both a confounder and an
we initiated a prospective cohort study of sure status and other covariates must be intermediate variable of the exposure-disease
lung cancer by mailing questionnaires to a observed at the start of follow-up in the relationship. Consequently, standard meth-
random sample of 500,000 adults living in entire study population, a rare disease would ods of analysis will generally lead to a biased
a given region served by population cancer mean that most subjects will remain non- estimate ofthe exposure effect, whether or not
registries. The questionnaire would request cases. Comparing a small number of cases one adjusts for the risk factor.
information on previous cancer diagnoses, with a large number of noncases is statisti- A statistical solution to the above prob-
exposure variables, and other risk factors cally and economically inefficient because of lems was recently developed by Robins
for lung cancer. Following responses by the diminishing marginal return from addi- (36,37) who treats the prolonged or chang-
100,000 selected residents, the cancer reg- tional noncases. Assuming a fixed sample ing predictor variables as time-dependent
istries would be used to identify all new size, therefore, it is more efficient to study a covariates for which repeated observations
cases of lung cancer diagnosed among disease with an expected risk of 30% than to are collected during the follow-up. The
respondents during the subsequent 5 years. study a disease with an expected risk of 1%; method involves estimating causal parame-
Even though the 100,000 respondents will the former will result in more precision and ters for hypothetical exposure experiences of
differ in many ways from the 400,000 non- power for estimating and testing the expo- the study population (15). For example, we
respondents, these differences will not sure effect. Moreover, substantial increases might want to compare the outcome risk for
cause epidemiologic bias in effect estima- in the sample size to compensate for too few all subjects had they remained exposed
tion. Nevertheless, the exposure effect expected cases is often impractical or impos- throughout follow-up with these subjects
observed for respondents (the base popula- sible, especially when the size of the exposed had they remained unexposed, controlling
tion) may not be generalizable to the popu- population available for study is limited. for confounders at the start of each interval
lation of nonrespondents. One possible Time-Dependent Exposures. In conven- (time stratum).
reason for this lack of generalizability is tional analyses of cohort-study data, exposure
that respondents and nonrespondents differ status and other covariates are usually treated Cros-Secton Study
on the joint distribution of one or more as fixed variables measured at baseline. Yet A cross-sectional design involves a single
effect modifiers. the instantaneous and cumulative level of ascertainment of disease prevalence in a
As we will see in the next two sections, most environmental exposures changes during study population that is usually sampled
the same level of nonresponse in a cross- the follow-up period. Consequently, the randomly from a single source population.
sectional or case-control study that we greater the change and the longer the follow- In this sense, the source population is that
assumed in the above cohort example up, the less appropriate are conventional larger group of individuals who are desig-
might seriously threaten the validity of methods of analysis. A common solution to nated by the investigator as being eligible
effect estimation. Thus, unlike cohort (or this problem is to measure average exposure, for inclusion in the study. Generally, in a
randomized) studies, nonresponse in other duration of exposure, or cumulative exposure cross-sectional study, we do not know how
basic designs can easily introduce selection before and during the follow-up period; then long prevalent (existing) cases have had the
bias because study cases have already these variables are analyzed like the simple base- disease, nor can we identify the base popu-
occurred when subjects are selected. As line exposure variable, as possible (fixed) predic- lation (at risk) from which the study cases
noted in "Sources of Epidemiologic Bias," tors of disease occurrence. Unfortunately, this arose. Exposure data on time-dependent
selection bias is most likely to be problem- approach also has methodologic problems: variables are usually measured retrospec-
atic when the investigator does not identify the a) if the follow-up period for detecting disease tively to allow for expected variations in disease
base population from which study cases arose overlaps the period during which exposure latency (before detection) and duration of
(as in cross-sectional studies and certain change is measured, the temporal relationship expression (after detection).
case-control studies). of an exposure-disease association is ambigu- The statistical analysis of cross-sectional
Weaknesses of a Cohort Design. A ous. We may not know whether exposure data typically resembles the analysis of
potential weakness of cohort designs is the changes preceded disease occurrence or disease cohort or case-control data. Instead of
loss of subjects to follow-up due to death preceded changes in exposure level. b) If the comparing disease risks for exposed and
from other diseases, lack of participation, levels of exposure and/or other risk factors unexposed groups, we compare disease
or migration. Unlike subject selection, loss change over time, the associations between the prevalences (P), as in a cohort study, or we
to follow-up can easily bias effect estima- exposure and these covariates also can change; compare the prevalence odds (P/(1-P)), as
tion if attrition is associated with disease then the amount of confounding of the esti- in a case-control study (see "Case-Control
risk to a different extent for exposed and mated exposure effect will change. The ana- Study"). Under certain conditions or assump-
unexposed groups (2,35). Unfortunately, lytic method described above, therefore, will tions, the prevalence ratio or prevalence
we can neither rule out nor confirm such not, in general, eliminate confounding due to odds ratio is approximately equal to the
bias by comparing lost subjects and fol- these risk factors (even when there is no mis- ratio of incidence rates or risks (i.e., the
lowed subjects with respect to baseline dassification). c) When an extraneous risk causal parameter of interest) (2,38). For
characteristics (including risk factors) (35). factor affects subsequent exposure status and is example, disease prevalence in a population

Environmental Health Perspectives Supplements 27


Volume 101, Supplement 4, December 1993
MORGENSTERN AND THOMAS

is a function of both incidence and the sure data and information on previous med- tional studies, exposure data on time-
duration of disease. If the mean duration ical diagnoses and the onset of symptoms dependent variables are generally measured
of disease (from onset to recovery or death) associated with the disease under study. retrospectively to account for expected varia-
is known to be identical for exposed and Not only may this approach be very unin- tions in disease latency.
unexposed cases, we can be more confident formative for temporally linking exposure Estimation of Effect. Unless the crude
that the prevalence odds ratio approximates and disease, but it is also likely to worsen disease rate or the size of the base popula-
the incidence rate ratio. another potential problem, measurement tion is known, we cannot estimate the risk
Example. Suppose we want to estimate error. Reliance on retrospecive data increases or rate of the disease in the exposed and
the possible effect of prenatal exposure to the likelihood and magnitude of measure- unexposed populations. Nevertheless, we
passive smoke (as in "Cohort Study") on ment errors, which generally leads to infor- can estimate the effect of the exposure on
birth weight, categorized for convenience mation bias. Furthermore, because all data disease by calculating the exposure odds
into low (<2500 g) and normal. We iden- are collected after disease has occurred, it is ratio, which computationally is similar to
tify all live births delivered in one hospital very possible for the error in measuring one the prevalence odds ratio in a cross-sec-
during a given period (the source popula- variable to be related to the other variable tional study (2,3,19,20). For this estima-
tion); then we take a random or quasi-ran- (differential misclassification) or to error intion of effect to be valid, however, the
dom sample (e.g., every third birth). By measuring the other variable (correlated controls must be representative of the base
obtaining exposure data retrospectively errors). Such possibilities are particularly population that gave rise to the study cases.
from mothers near the time of delivery, we likely in survey research and make potential In this context, representative means hav-
can compare the prevalence of low birth information bias severe and unpredictable. ing a similar distribution on other disease
weight for infants prenatally exposed and When cross-sectional studies are con- risk factors and indicators of disease detec-
unexposed to passive smoke, controlling ducted without random sampling, they tion. The best method for making the con-
analytically for confounders (e.g., maternal offer little opportunity for making statisti- trols representative in this way is to sample
age, maternal smoking, and prenatal care). cal inferences about descriptive, popula- them randomly (with or without matching)
Even though births may be regarded as tion-specific parameters, e.g., the prevalence from the base population (see below).
incident events, the infant's weight at birth is a of a disease in a specified source population Matching. As in any observational
prevalence measure, because we do not know (28). The lack of random sampling may study, the investigator should control ana-
the size of the base population. The causal also worsen the potential problem of selec- lytically for confounders by stratification or
parameter ofinterest is a hypothetical compari- tion bias in effect estimation, which would model fitting. Intuitively, it would appear
son of retarded development between fetuses be difficult to rule out a priori or to correctthat one method for achieving this control
exposed to passive smoke and those fetuses had in the analysis. Even with random sam- is to match controls to cases on extraneous
they not been exposed. Not only can we not pling, however, disease status or exposure risk factors (i.e., making controls similar to
observe this hypothetical condition of exposed status can influence the selection of sub- cases on the joint distribution of these risk
fetuses being unexposed, but we do not (or jects differentially by category of the other factors). In a case-control study, however,
cannot) follow the base population; the preva- variable. For example, exposed cases may it is not the matching alone that controls
lence of low birth weight is simply the end be less likely than others to be selected for for the confounding effects of the matching
result of that hypothetical follow-up. study, perhaps because new exposed cases variables; rather, stratification in the analy-
Strengths of a Cross-Sectional Design. are less likely to survive than new unex- sis eliminates this bias (1-6). In fact, the
Because there is no follow-up, cross-sectional posed cases (i.e., selective survival) or net effect of matching in case-control stud-
studies are less time-consuming and costly because exposed cases are less likely to enter ies (but not in cohort studies) is to intro-
than prospective cohort studies. It is also fea- the specified source population such as a duce selection bias that must be controlled
sible to examine many exposures and diseases hospital (i.e., Berkson's bias) (2). Similarly,in the analysis. Thus, if the matching is
in the same study, which makes this design selection bias can result from the differen- ignored in the analysis, the effect estimate
useful for screening new hypotheses. In addi- tial participation of selected subjects (i.e., will usually be biased (2,4,14).
tion to causal inference, cross-sectional stud- response bias). The potential advantage of matching in
ies are important descriptively in health the selection of subjects is that it allows the
administration, planning, and policy analysis; Case-Control Study investigator to control for confounders
information on disease prevalence is often Case-control studies are distinguished from more efficiently than if matching is not
required to assess the need and demand for other basic designs by their sampling strat- used (1-6). Yet, in this regard, matching
health services and to evaluate intervention egy: The investigator selects only a fraction can be counterproductive if one matches in
programs in specific target populations (2). of noncases (controls) from the population a case-control study on strong correlates of
Weaknesses of a Cross-Sectional Design. from which the cases were identified exposure in the base population that are
A major methodologic limitation of many (2,3,5,34,39). Sometimes this population is not risk factors (or proxy risk factors) for
cross-sectional studies for making causal not the true (primary) base population (out the disease. This type of overmatching
inferences is temporal ambiguity of cause of necessity or convenience), and occasion- results in a decrease in statistical efficiency
and effect. Because we usually do not know ally controls are assembled without regard (i.e., less precision for a given number of
the duration of the disease in prevalent cases for the identification of cases. The design cases and controls) (1-6). The conditions
and because exposure status is measured at may be longitudinal, involving incident for overmatching, however, are very differ-
the same time as disease status, often we cases, or cross-sectional, involving prevalent ent in cohort studies in which unexposed
cannot determine that exposure (or a certain cases. In both types, the investigator estab- subjects are matched to exposed subjects
accumulation of exposure) preceded disease lishes the ratio of controls to cases, which (40). Matching can also be economically
occurrence. One approach for minimizing does not depend directly on the frequency counterproductive for achieving a certain
this problem is to collect retrospective expo- of disease in the population. As in cross-sec- minimal precision if it costs more to match

28 Environmental Health Perspectives Supplements


Volume 101, Supplement 4, December 1993
STUDY DESIGN

than to increase the sample size without base population is followed prospectively, The traditional method of analysis is to
matching (41). using hospital records and/or a population compute the PMR, which is the proportion
Population-Based Case-Control Study. registry to identify all infant deaths. For of exposed deaths resulting from the index
In a population-based or hybrid case-con- each diagnosed and confirmed case of disease divided by the proportion of unex-
trol study, controls (noncases) are sampled SIDS, we randomly select two live controls posed deaths resulting from the index dis-
directly from the base population that gave matched to the case on age, race, and date ease (6). Altematively, the data are analyzed
rise to the cases (39,42). When this design of the case's death; thus, controls are den- as in a case-control study, the researcher com-
involves the follow-up of a large dynamic sity sampled from the follow-up experience putes the mortality odds ratio (46,47). An
population, such as residents of a state, iden- (risk set) of the base population of live important advantage of the alternative
tification of new cases is usually based on births. As soon as possible after case detec- approach is that the comparison (control)
data collected through a population registry. tion, we interview the mothers of all sub- group might be selected to indude only those
The validity of effect estimation depends on jects to collect data on prenatal exposure to diseases thought to be unrelated to exposure
the completeness and accuracy of case ascer- passive smoke and other covariates. status. This design strategy, which also should
tainment and on careful description of the Proportional (Case-Control) Study. A be used in a proportional morbidity study,
base population. When the design involves proportional study is a special type of can help reduce selection bias by making
the follow-up of individuals in a fixed case-control study in which selected con- the comparison group more representative of
cohort (e.g., as a part of a clinical trial or trols have developed or died from diseases the base population. Another advantage is that
cohort study), identification of new cases is other than the index disease under study it allows use of the many analytic techniques
done by exams, interviews, or questionnaires (2). By definition, therefore, this is not a developed for case-control studies (48,49).
administered periodically to each individ- population-based design, since controls Strengths of a Case-Control Design.
ual in the cohort during the follow-up. (especially deaths) may not be representa- The major advantage of the case-control
This latter strategy is npw called a nested tive of the base population from which design over other basic designs is its effi-
case-control study but also has been called a study cases arose. In a proportional mor- ciency for studying rare diseases, especially
synthetic case-control study (43). bidity study, both cases and controls are diseases with long latent periods. A greater
There are three alternative methods for selected from a dinical population such as proportion of study costs for collecting
selecdng controls in a longitudinal, population- a hospital, clinic, physician's practice, or exposure and covariate data can be devoted
based case-control study a) In density sam- screening program. Controls are selected to cases rather than expending most available
pling, controls are selected longitudinally because they have other conditions or symp- resources on noncases. Thus, given a fixed
throughout the follow-up. Typically, they toms; thus, and they are likely to differ from sample size, case-control sampling in a study
are individually matched to cases on time of the base population of cases in ways that affect of a rare disease enhances the precision and
each case's diagnosis or identification and disease occurrence or detection. This situation power for estimating and testing the exposure
possibly other factors; i.e., each control is will usually occur when the exposure is a risk effect. In addition, some case-control stud-
known to be at risk (disease-free) at the time factor for those comparison diseases making ies, particularly proportional mortality
its matched case was first identified as dis- up the control group. For example, we would designs, tend to be relatively inexpensive
eased. An advantage of time matching is obtain a severely biased estimate of the smok- and feasible because they can be based on
that exposure status is measured at about the ing effect in a hospital-based, case-control readily available data sources.
same time for all subjects in each matched study if controls were selected from emphy- Weaknesses of a Case-Control Design.
set (19). b) In cumulative sampling, all sema patients because smoking is a strong risk A key issue in the design of case-control
controls are selected at the end of the follow- factor for emphysema. studies is the method and procedures for
up period during which cases are identified. Deaths comprise the entire population selecting controls. Ideally, we would like
Both cumulative- and density-sampling of a proportional mortality ratio (PMR) to make each study population-based, such
methods can be used even when controls are study. A group of deaths from the index that every new case occurrence in a well-
not selected directly from the base popula- disease (cases) is compared with a group of defined base population is immediately
tion. c) In case-base or case-cohort sam- deaths from other diseases that might identified by the investigators and controls
pling, all controls are selected from the fixed include selected comparison disease(s) or are sampled randomly from the base popu-
base population at the start of the follow-up all other causes of death. Typically, all lation. In practice, however, this goal is
(42,44,45). An advantage of this method is study deaths are identified retrospectively not so easily accomplished, especially when
that one control group can be used to study from the follow-up of a single population, the base is a large, dynamic population that
multiple diseases, provided that prevalent such as persons living in a certain region or cannot be examined periodically. Even
cases of each disease are excluded from the employed by a certain company during a population surveillance and registry sys-
analyses involving that disease. In both given period. Although study deaths are tems, when they exist, are likely to be very
case-base and density sampling, it is possible incident events often identified from a incomplete for many diseases, such as
for a selected control to subsequently develop defined base population, the outcome vari- prostate cancer, Alzheimer's disease, and
the disease and become a case in the study. able in this design is prevalence of disease ischemic heart disease. If exposed cases are
Example. Suppose we want to estimate at death; we do not have the proper denomi- more likely or less likely to be detected or
the possible effect of prenatal exposure to nator to estimate the disease-specific mortality reported than unexposed cases, the result-
passive smoke (as in previous examples) on rate in any (base) population. Furthermore, ing effect estimate will be biased. In a
the risk of sudden infant death syndrome exposure data are not obtained for the base cohort study, this detection problem would
(SIDS). Using hospital records and birth population but for study deaths only. manifest as differential disease misclassifica-
certificate information, we identify a large In the conventional proportional mortal- tion bias; but in a case-control study, the
number of live births occurring in a given ity study, comparison deaths are all other detection problem produces a form of
region during a certain period. Then this causes of death occurring in the population. selection bias that might involve no disease

Environmental Health Perspectives Supplements 29


Volume 101, Supplement 4, December 1993
MORGENSTERN AND THOMAS

misclassification in the total sample and, for those exposure-disease groups that con- autocorrelation of outcome status (i.e., a
therefore, cannot be corrected after subject tain fewer subjects in stage 1. Thus, the case is either exposed during the effect
selection (50). To prevent such detection investigator can obtain approximately equal period or unexposed, but it cannot be both).
bias, the investigator might select controls numbers of the four exposure-disease Although the case-crossover design has not
who, purportedly, underwent the same groups in stage 2. Stage 1 results are used yet been used to examine the possible
degree of medical surveillance as did study to estimate the crude (unadjusted) expo- short-term effect of an environmental
cases (51) (e.g., persons screened for the sure effect, and stage 2 results are used to exposure, this type of study is feasible if we
disease or patients treated for other related estimate the effect adjusted for covariates can measure such exposures.
conditions). Unfortunately, this approach and possibly a more refined exposure effect.
could introduce another problem by selecing The analysis of stage 2 data considers the Gene6c Stdy
for the control group individuals with other sampling fractions (52-56). The two- The study of variation among individuals or
exposure-related conditions (see the discussion stage design is also advantageous when the groups in their sensitivity to environmental
of proportional morbidity studies). The end cost of obtaining covariate data is large rel- agents is one of the aims of environmental
result might be, for example, to overcompen- ative to the cost of obtaining exposure and epidemiology. Such variation might be due
sate for potential detection bias, producing net disease data or when covariate data are to differences in host characteristics, indud-
bias in the opposite direction. In general, in missing on a majority of subjects (52,54). ing genetic factors, or to interactions with
the absence of perfect population-based meth- Case-Crossover Design. A standard other exposures. A complete survey of the
ods, investigators must select controls to reflect crossover design is an experiment or quasi- methods used to study the genetic determi-
the expected magnitudes of various potential experiment in which each subject receives nants of disease would be beyond the scope
selection problems. both the experimental and control treat- of this report; instead, we will focus on the
When there is relatively little variability ments at different times (i.e., each subject approaches that might be used to address the
of exposure in the base population, we serves as his or her own control) (57). Such issue of gene-environment interactions.
expect imprecise estimation of the exposure designs are seldom used in environmental Three basic types of information might
effect, even if the exposure is a strong risk epidemiology because manipulation of shed light on the genetic component of
factor for the disease. Although such inef- treatment status (with or without random- such interactions: a classification of the
ficiency is usually quite apparent in cohort ization) is usually unethical or infeasible subjects' genotypes at a major locus for dis-
studies, it may not be so apparent in and because the outcome is usually a rare ease susceptibility; some observable host
case-control studies, especially when the event. Recently, Maclure (58) proposed an characteristic (phenotype) that is geneti-
investigator does not know the exposure observational analogue of the crossover cally determined and linked with the geno-
distribution in the base population. For study called the case-crossover design, type that was responsible for sensitivity; or
example, if environmental exposure levels which may be regarded as a special type of family history as a surrogate for genetic (or
are high throughout the region of the base pairwise-matched, case-control study. shared environmental) influences. The
population, a comparison of cases and con- This type of design can be used to estimate choice of study design will depend upon
trols would result in an unstable estimate the possible transient effect of a brief expo- which of these is sought.
of effect and low power. As in cohort stud- sure (e.g., coffee drinking) on the subse- The first is the most powerful approach,
ies, the problem is not one of bias. Limited quent occurrence of a rare acute-onset and its feasibility will grow as more and
variability of exposure is likely to occur disease (e.g., myocardial infarction) that is more genes are identified and assays for
when exposure status for individual sub- hypothesized to occur within a short time them become available. If the genotype is
jects is measured ecologically by assigning after exposure (i.e., during the effect observable, it can be considered simply
to each subject the exposure level observed period). All subjects are newly detected another risk factor and any of the basic
for the area in which that subject lives or cases that serve as their own controls. That design and analysis strategies used in epi-
works. Other problems accompanying is, for each case, the observed odds of being demiology are applicable. For example,
ecologic measurement are discussed in exposed during the effect period (e.g., one Caporaso et al. (59) reported a case-control
"Ecologic Designs." hour before disease onset) is compared study of lung cancer, in which the rate of
Two-Stage Designs. Just as cohort with the expected odds of being exposed metabolism of the antihypertensive drug
studies are inefficient for studying rare dis- during any random period of the same debrizoquine was taken as a phenotypic
eases, case-control studies are inefficient duration (assuming no exposure effect). marker for a gene in the cytochrome P450
for studying rare exposures. When both The expected exposure odds is estimated system that is responsible for metabolism of
disease and exposure are rare, therefore, any from the subject's report of his usual expo- carcinogens. It was shown that intermediate
basic design might require a very large sam- sure frequency before disease occurrence. and high metabolizers were at higher risk of
ple size to ensure adequate power. One For example, if a person drinks coffee twice lung cancer overall and that there was an inter-
solution to this problem is a two-stage each day and the effect period is 1 hr, the action between metabolic rate and exposure to
design: stage 1 is a basic design in which expected odds of exposure during any 1-hr occupational carcinogens and smoking. In this
data are collected on exposure and disease period is 1/11. Thus, we would expect example, the genotype was not observed
variables only; in stage 2, covariate data that, for every 12 cases who drank coffee direcdy but inferred from the phenotype, but
and possibly more refined exposure data twice each day, one case would have recent advances in molecular genetics, such as
(with less measurement error) are collected occurred by chance within 1 hr of expo- the use of restriction fragment length poly-
on separate random samples of exposed sure. Maclure recommends using standard morphism, are making direct observation
cases, exposed noncases, unexposed cases, methods of matched analysis for person- increasingly feasible.
and unexposed noncases, all of which are time (cohort) data to combine data from Identifying host characteristics that
identified from stage 1 results (52,53). all cases; however, this approach needs fur- interact with environmental exposures can
Sampling fractions for stage 2 are set larger ther development to handle the temporal be done in essentially the same way. A

30 Environmental Health Perspectives Supplements


Volume 101, Supplement 4, December 1993
STUDY DESIGN

familiar example might be skin color as a case-control approach requires data only on Ecologic designs are therefore incom-
marker for sensitivity to sunlight in the the originally sampled cases and controls. plete (2) in the sense that they lack certain
production of melanoma. No extensions of Clearly, the cohort approach is more infor- information ordinarily contained in the
standard epidemiologic methods would be mative, but the conditions under which the basic designs. As noted in "Problems in
needed to address this question. The only additional effort warrants the gain in infor- Environmental Epidemiology," the pri-
subtlety in this case arises when the gene mation about gene-environment interactions mary reason for this missing information in
determining the host characteristic is not the have not been investigated. environmental epidemiology is our inabil-
disease susceptibility locus but only linked The two major design issues to be ity or lack of resources to accurately mea-
to it (i.e., nearby on the same chromosome). addressed in such studies are the method of sure environmental exposures in large
A particular marker allele might be associ- ascertainment of families and the informa- numbers of individuals. Thus, the wide-
ated with the disease in one family and a dif- tion to be collected on family members. spread use of ecologic designs in environ-
ferent allele in another family-, but in both The former has been discussed at great mental epidemiology reflects a fundamental
families, the marker and the disease would length in the genetics literature. The basic problem of exposure measurement. In
be inherited together. This possibility problem is that if families are ascertained addition, ecologic studies represent an
requires family data and the techniques of through affected probands, families with mul- inexpensive design option for linking avail-
linkage analysis. To date, such analyses have tiple cases will tend to be overrepresented. able data sets or record systems, even when
been applied only to the study of genetic Therefore, various corrections for ascertain- exposures are measured at the individual
effects without reference to the environmen- ment are applied in the standard methods level. The appeal of this alternative is that
tal covariates, but statistical techniques that of genetic analysis. The relevance of these aggregate summaries of many exposures,
would assess such combined effects recendy approaches to the epidemiologic designs induding sociodemographic and other census
have become available (60). for gene-environment interactions requires variables, are often available for the same
Before trying to identify a specific major further research. Often, only very limited regions that are used to summarize morbidity
gene that is related to sensitivity to environ- information is collected about family his- and mortality data.
mental exposures, one should assess whether tory in epidemiologic studies. The mini- With the inclusion of covariate data in
there is any evidence that such sensitivity has mal information should be an enumeration an ecologic study, the analysis may be only
a genetic basis. This also requires collection of all affected family members together partly ecologic. This condition occurs
of family data, but unlike the standard analy- with the sex and age of each family mem- when the joint distribution of two or more,
ses aimed at examining the main effect of ber at risk; as discussed above, information but not all, variables is known within
genetics, one would also want to examine on major risk factors for all family mem- groups. For example, suppose we want to
interactions between family history and envi- bers at risk may also be desirable. Because examine the possible effect of radon expo-
ronmental exposures. Geneticists commonly larger and older families are likely to have sure on lung cancer incidence, controlling
assemble a small number of very large pedi- more familial cases, the presence or num- for age (the covariate). Although we might
grees, sometimes selected to maximize the ber of familial cases is not suitable as a fam- know the age distribution of all new cases
chances that a major gene is operating in the ily history covariate. Moreover, expressing and all persons at risk within each county
families, and subject them to segregation the number affected as a proportion does (from tumor registry and census data), we
analysis to study the mode of inheritance. not solve the problem because multiple would usually not know the within-county
Again, these analyses seldom account for cases in large families are more informative association between radon exposure and
environmental covariates and interactions, than single cases in small families. A more the other two variables. Sometimes data
although such methods are now available. In appropriate comparison is between the sets like this are analyzed with the individ-
contrast, epidemiologists begin with a large observed number of familial cases and the ual as the unit of analysis, where each indi-
population-based series of cases and controls eqxcted number based on the person-time at vidual is assigned the average radon exposure
and restrict attention to the first- and some- risk, in which the comparison is adjusted for level that was measured for the region in
times second-degree family members. Their age, sex, and other important risk factors (62). which he or she lives. Such ecologic measure-
analyses usually are limited to a simple family ment of exposure means that there is likely to
history covariate (e.g., presence of an affected Ecologic Designs be substantial error in measuring the individ-
member, number of affected members, etc.) An ecologic or aggregate study is one in ual's exposure to radon, which could result in
in standard multivariate risk-factor models, which exposure levels of individuals are not information bias of effect estimation.
possibly but seldom including interactions linked to disease occurrence of those indi-
with environmental covariates. Susser and viduals. The net result is that the unit of Types of Ecologic Studies
Susser (61) have discussed two basic statistical analysis is usually the group, typi- Ecologic studies may be classified into five
approaches to such data: in the case-con- cally persons living in a geographic area such design types that differ in several ways,
trol approach, cases are compared with con- as a census tract, county, or state. For each including methods of subject selection and
trols in terms of their family histories; in the group or region, therefore, we know the methods of analysis (2,63).
cohort approach, the incidence of disease in average exposure level or distribution and Exploratory Studies. In exploratory
the exposed (case) families is compared with the disease rate, but we do not know the ecologic studies, we compare the rate of
the incidence in unexposed (control) fami- joint distribution of these two variables. disease among many contiguous regions
lies. Either approach could easily be Given a dichotomous exposure, for example, during the same period, or we compare the
extended to incorporate environmental we would not know the numbers of exposed rate over time in one region. In neither
covariates and their interactions with fam- and unexposed cases in each group. Thus, approach are exposures to specific environ-
ily history. The only difference is that the we cannot estimate the exposure effect mental factors measured (for individuals or
cohort approach requires covariate data on directly by comparing the disease rate for groups). The purpose is to search for spa-
all of the family members, whereas the exposed and unexposed populations. tial or temporal patterns that might suggest

Environmental Health Perspectives Supplements 31


Volume 101, Supplement 4, December 1993
MORGENSTERN AND THOMAS

an environmental etiology or more specific birth cohort (year of birth). Because of the will probably want to compare the disease rate
etiologic hypotheses. linear dependency of these three variables, in the duster-area population with the rate in
The simplest type of exploratory study there is an inherent statistical limitation another population thought to be unexposed
of spatial patterns is a graphical comparison (identification problem) with the interpre- (retrospective cohort study), and they may
of relative rates across all regions (i.e., map- tation of cohort-analysis results. The prob- conduct a population-based case-control study
ping study), possibly accompanied by a sta- lem is that each data set has alternative to identify risk factors for the disease.
tistical test for the null hypothesis of no explanations with respect to the combina- Multiple-Group Study. In a multiple-
geographic clustering (64). In mapping tion of age, period, and cohort effects. The group ecologic study, we assess the ecologic
studies, however, a simple comparison of only way to decide which interpretation association between average exposure level
estimated rates across regions is often com- should be accepted is to consider the find- or prevalence and the rate of disease among
plicated by two statistical issues. First, ings in light of other (prior) knowledge of many groups or regions. This is the most
regions with smaller numbers of observed the disease and its determinants. frequently used ecologic design in environ-
cases show greater variability in the estimated A cohort analysis was conducted by Lee mental epidemiology. Studies are usually
rate; thus, the most extreme rates tend to be et al. (67) on melanoma mortality among conducted by linking separate sources of
estimated for those regions with the fewest white males living in the United States data. For example, census and tumor-reg-
cases. Second, nearby regions tend to have between 1951 and 1975. They concluded istry data might be combined to estimate can-
more similar rates than do distant regions (i.e., that the apparent increase in the melanoma cer rates for all counties in a state; other
positive autocorrelation). A statistical method mortality rate during that period was due state records or surveys might be used to
for dealing with both complications involves primarily to a cohort effect. That is, per- estimate average exposure levels by county.
empirical Baye' estimation of rates using an sons born in more recent years carried with Statistical methods for estimating exposure
autoregressive spatial model (65). them throughout their lives a higher mor- effects in multiple-group studies are dis-
In certain exploratory studies of spatial pat- tality rate than did persons born earlier. In cussed in "Interpretation of Results" and
terns, regions are characterized in terms of a subsequent review paper, Lee (68) specu- by Prentice and Thomas in this issue.
general ecologic indicators such as degree of lates that the cohort effect might reflect the Hatch and Susser (71) conducted a mul-
urbanization (urban versus rural), degree of impact of changes in a major risk factor tiple-group ecologic study to examine the
industrialization (agricultural versus nonagri- operating during youth, such as sunlight association between background gamma radi-
cultural), population density, socioeconomic exposure or burning. ation and childhood cancers between 1975
status, and ethnic diversity. The analysis of Space-Time Cluster Study. Space-time and 1985 in the region surrounding the
these data usually involves comparisons of clustering refers to the interaction between Three Mile Island nuclear plant. Using data
regions grouped by one or more ecologic indi- place and time of disease occurrence, such from a 1976 aerial survey, they estimated the
cators. This approach resembles the statistical that cases that occur dose in space also occur average radiation level for each of 69 tracts in
methods used in multiple-group studies (see dose in time (2). Evidence of space-time the study region. The results of their analyses
"Interpretation of Results"). clustering may suggest person-to-person showed a positive association between radia-
An exploratory ecologic study was con- transmission of an infectious agent or the tion level and the incidence of childhood can-
ducted by Mahoney et al. (66), who com- effects of point-source exposures, depending cers. The authors were cautious in making
pared age-standardized mortality ratios for on the disease and the cluster pattern. The causal inferences, however, because the large
cancers, by sex, among all cities and towns analytic search for space-time clusters effect observed for solid tumors, as well as
in New York State (exclusive of New York requires special statistical techniques that leukemias, was not expected.
City) between 1978 and 1982. By group- may or may not incorporate information Time-Trend Study. In time-trend (or
ing these regions by quintile of population on the base population and covariates time-series) studies, we assess the ecologic
density, they examined the associations (69,70). Although the unit of analysis for association between change in average expo-
between density and deaths from all cancer these methods is usually the individual, sure level or prevalence and change in dis-
sites and selected sites, by sex. They found space-time cluster studies are classified here ease rate in one geographically defined
linear associations between increasing pop- with ecologic designs because closeness in population. The assessment may be done
ulation density and total cancer mortality space and time is a proxy measure for envi- by simple graphical displays or by more for-
in both men and women. Because popula- ronmental exposures-or at least the opportu- mal statistical techniques (72-75). With
tion density may reflect various risk factors nity for expsr. Thus, use ofplace and time either approach, however, the interpretation
for different cancers, the authors acknowl- information is analogous to use of spatial or of findings is often complicated by two
edge that their findings are consistent with temporal indicators in the exploratory study. issues. First, changes in disease dassification
several alternative explanations. Space-time duster analyses may be used and diagnostic criteria can produce very
An exploratory study of temporal pat- when members of a community perceive a misleading results. Second, the latency of
terns is generally done by comparing dis- cluster or excess number of cases of one or the disease with respect to the exposure of
ease rates for a geographically defined more diseases in their area. This activity is interest may be long, variable across cases,
population over a period of at least 20 often motivated by the suspicion that the and/or unknown to the investigator; thus,
years. A common statistical or graphical apparent duster is caused by a specific envi- employing an arbitrary or empirically
approach for analyzing such longitudinal ronmental exposure, such as chemical waste, defined lag between the two trends can also
data is cohort analysis (not to be confused pesticides, or electromagnetic fields. When produce very misleading results (76).
with the analysis of data from a cohort investigation begins, the first steps are to ver- Darby and Doll (77) compared the trends
study) (2). The objective of this approach ify the diagnoses of all reported cases and iden- of average annual absorbed doses of radiation
is to estimate the separate effects of three tify any additional cases in the cluster area, fallout from weapons testing and childhood
time-related variables on disease occur- which must be defined. In addition to leukemia rates in three European countries
rence: age, period (calendar time), and space-time duster analyses, the investigators between 1945 and 1985. Although the

32 Environmental Health Perspectives Supplements


Volume 101, Supplement 4, December 1993
STUDY DESIGN

leukemia rates varied over time in each coun- a + b(O) = a; therefore, the estimated rate Sources ofEcologic Bias. Ecologic bias is
try, they found no convincing evidence that ratio is (a + b)la = 1 + bla. It is important often confused with confounding, perhaps
these changes were attributed to changes in to note that this estimation procedure because regional differences in disease rates
fallout radiation. implies extrapolating the results of the can be due to variation in the distribution of
Mixed Study. The mixed ecologic model to both extreme values of the expo- extraneous risk factors across regions. To
design combines the basic features of the sure variable, either or both of which may darify the confusion between these two con-
multiple-group study and the time-trend lie well beyond the observed range. It is cepts, Greenland and Morgenstem (86;87,90)
study. The objective is to assess the eco- not surprising, therefore, that different model show that ecologic bias can arise from three
logic association between change in average forms can lead to very different estimates of different sources.
exposure level or prevalence and change in effect (81). In fact, certain model assumptions Within-Group Confounding (At the
disease rate among many groups. Thus, may lead to rate-ratio estimates that are Individual Level). The exposure effect may
two types of comparisons are made simul- negative and thus meaningless. be confounded within groups (as described
taneously: change over time within groups for nonecologic studies in "Sources of
and differences among groups. Ecologic Bias Epidemiologic Bias"). Thus, if the within-
For example, Crawford et al. (78) eval- The use of ecologic data to estimate causal group effect is equally confounded by the
uated the hypothesis that hard drinking parameters has a major methodologic limi- same unmeasured risk factors in every group,
water (i.e., water containing more calcium tation, called the ecological fallacy (82), we can expect the ecologic estimate of effect
and magnesium ions) is a protective risk aggregation bias (83), cross-level bias (84), to be biased as well. In general, ecologic esti-
factor for cardiovascular disease (CVD). and ecologic bias (85,86). Ecologic bias mates will be biased in this way if the net
They compared the absolute change in refers, in general, to the failure of ecologic within-group bias across groups (due to
CVD mortality rate between 1948 and estimates of effect to reflect the true effect uncontrolled confounders) is not zero. It is
1964, by age and sex, in 83 British towns. at the individual level. Some of this bias possible, therefore, for positive confound-
The towns were divided into three groups: may occur in individual-level studies of the ing in certain groups to cancel negative
a) five had experienced increases in water same population, but some of it is due confounding in other groups.
hardness; b) six had experienced decreases, specifically to the aggregation of subjects The other two sources of ecologic bias
and c) 72 had experienced little or no into groups. More importantly, the magni- are unique to this design and can be under-
change in water hardness. In all sex-age tude of ecologic bias is likely to be more severe stood by considering group (or group affili-
groups, especially for men, the authors and less predictable than is individual-level bias ation) as a nominal predictor of disease
found an inverse association between in estimating the same effect (63,81,8687). occurrence at the individual level.
trends in water hardness and CVD mortal- It is very possible, for example, that an ecologic Confounding by Group. Ecologic bias
ity. In middle-aged men, for example, the analysis of a (true) positive risk factor would can occur when the disease rate in the unex-
increase in CVD mortality was less in produce an apparently protective effect. posed population varies across groups. Since
towns that made their water harder than in The underlying problem of ecologic average exposure level also typically varies
towns that made their water softer. bias may be regarded as a special form of across groups, group is a confounder of the
information bias resulting from within- exposure effect at the individual level. This
Interpretaon ofResults group heterogeneity of exposure status, set of conditions may occur if one or more
Statistical analysis in a multiple-group study which is not captured in the analysis. For unmeasured risk factors are differentially dis-
usually involves fitting the data to a mathe- example, a positive linear relationship tributed across groups, even if these risk fac-
matical model (see Prentice and Thomas, between proportion exposed and disease tors are unrelated to exposure status within
this issue). The outcome variable is a func- rate does not necessarily mean that exposed groups and, therefore, are not confounders at
tion of the disease rate in each group; pre- individuals are at greater risk for the disease the individual level.
dictors indude the average exposure level or than are unexposed individuals; rather, Effect Modification by Group. Ecologic
proportion exposed in each group plus other unexposed individuals may be at greater bias can also occur when the exposure
ecologic covariates, the effects of which the risk in groups containing proportionally effect varies across groups, i.e., when group
investigator wants to control. We show in more exposed individuals. The implication modifies the effect of the exposure at the
"Control for Covariates" that these covari- of this latter explanation is that an individ- individual level. This condition may result
ates need not be confounders (i.e., at the ual's group affiliation has an effect on dis- from extraneous risk factors (effect modi-
individual level within groups). ease occurrence that reflects more than just fiers) being differentially distributed across
Results of the fitted model can be used the individual's exposure status. groups or by misspecification of the model
to estimate the exposure effect, i.e., the A mathematical understanding of eco- form used to analyze the data. Ecologic
same causal parameter we would like to logic bias was first provided for correlation bias of this type tends to be more severe
have estimated had the study been con- coefficients by Robinson (88) and later when there is little variability in average
ducted at the individual level (63,79,80). extended to regression coefficients by Duncan exposure across groups (85), even when
For example, suppose the exposure variable et al. (89). Nevertheless, the conditions for the effect modification is relatively weak
is the proportion exposed in each group valid ecologic estimation and the relationship and there is no confounding by group.
and there are no covariates. Assuming a between ecologic bias and other method- Taken together, the above principles
linear model, we can use weighted least- ologic issues are still not well understood. imply that there will be no ecologic bias if
squares regression to estimate the slope (b) Because the results of ecologic analyses are the disease rate in the unexposed popula-
and intercept (a). The predicted disease often used to influence policy decisions, as tion and the exposure effect do not vary
rate in a group that is entirely exposed is well as to make causal inferences, it is across groups and if there is no net con-
then a + b(l) = a + b, and the predicted important for researchers to appreciate the founding within groups. Unfortunately, it
rate in a group that is entirely unexposed is complexities of ecologic inference. is very unlikely that all of these conditions

Environmental Health Perspectives Supplements 33


Volume 101, Supplement 4, December 1993
MORGENSTERN AND THOMAS

will be met in one ecologic study. Although tors either increases or decreases bias have not always reduce ecologic bias due to the
small departures from these conditions may not been delineated. Yet, under certain variables for which the rates are standardized;
result in substantial bias (81,86), it is also restrictive conditions, ecologic control for in fact, the result may be to increase bias
possible that there will be little or no bias covariates will produce unbiased estimates appreciably (8692). Standardization can be
in certain studies when one or more of of the exposure effect, provided there are expected to reduce ecologic bias only if all
these conditions are not met. no other sources of bias (e.g., outcome mis- variables in the model (i.e., disease and all
If every group were completely exposed dassification). If the effects of the exposure predictors) are mutually standardized for
or unexposed, there would be no ecologic and the covariate on disease rate are exactly those other confounders (e.g., age) not
bias attributable to confounding or effect additive within every region (i.e., the rate included as predictors in the regression
modification by group. Indeed, if all covari- difference for each variable is constant model. This method is often not feasible, for
ates were measured at the individual level, across levels of the other variable) and if the example, when the investigator does not
such a study would not be an ecologic rate conditional on both predictors is know the age distribution of exposed and
design. Thus, to reduce ecologic bias, we exactly the same in every region, ecologic unexposed populations within every region.
should select regions that minimize within- regression of disease rate on the mean
region exposure variation and maximize exposure and covariate levels (i.e., multiple Other Methodologic Prblems
between-region variation (63,81). One linear regression) will lead to unbiased esti- In addition to ecologic bias and the related
strategy for achieving these goals is to choose mates of both effects (83,84). Under these difficulties of controlling for extraneous
the smallest unit of analysis for which conditions, group affiliation does not con- risk factors, there are other methodologic
required data are available (e.g., census tracts found or modify the exposure effect at the problems with ecologic analysis, a few of
or blocks). Unfortunately, certain data are individual level. However, as shown by which are addressed below.
seldom available at this level (e.g., personal Greenland (81), relatively minor deviations Exposure Misclassification Bias. As
behaviors and biomedical factors), and there is from perfect additivity (linearity) can lead noted in "Sources of Epidemiologic Bias,"
no guarantee that these smaller units are more to appreciable ecologic bias because eco- nondifferential misclassification of expo-
homogeneous with respect to exposure status. logic rate ratios can be extremely sensitive sure status in individual-level studies nearly
Furthermore, use of smaller groups might to the choice of model form, in contrast to always results in bias toward the null value;
increase the problem of migration between individual-level estimates. Furthermore, e.g., the estimated rate ratio will be doser to
groups (see "Other Methodologic Problems"). the two conditions noted above are only one than is the true rate ratio. In multiple-
sufficient for no ecologic bias to occur; eco- group ecologic studies, however, this princi-
Contol for Covariat logic bias may be absent when either or ple does not hold when the exposure variable
In a study conducted entirely at the indi- both conditions are not met. is formed from the aggregated observations
vidual level, an extraneous risk factor pro- Richardson and Hemon (91) recently of all individuals in each region (e.g., the
duces bias (confounding) in effect estimation pointed out that there is another set of proportion exposed). Brenner et al. (93)
only if it is associated with exposure status conditions for which ecologic control of have shown that nondifferential misdassifi-
in the base population (see "Sources of covariates is possible. If a) the exposure cation of a binary exposure within groups
Epidemiologic Bias"). In a multiple-group and covariates are uncorrelated within usually leads to overestimation of the rate
ecologic study, however, an extraneous risk regions, b) their effects on disease are mul- ratio (away from the null value) in ecologic
factor can produce ecologic bias even if it is tiplicative (i.e., the rate ratio for each vari- studies, which can be severe. This apparent
not associated with exposure status within able is constant across levels of the other contradiction between ecologic and individual-
regions (at the individual level) (86,87,90). variable), and c) the rate conditional on level studies can be understood by considering
Such bias occurs typically because the eco- both predictors is exactly the same in every just two regions. Nondifferential exposure
logic association (across regions) between region, then ecologic bias due to the covari- misdassiflcation in both regions will produce
the exposure and risk factor produces con- ates can be removed or largely reduced by an estimated difference in exposure preva-
founding and/or modification of the expo- including product terms in the linear lence that is smaller than the true difference.
sure effect by group (see "Ecologic Bias"). model. Of course, such conditions are very Consequently, the estimated regression
Conversely, an extraneous risk factor that is a difficult to verify in ecologic studies; if the coefficient (slope) for the exposure variable
confounder within regions may not produce exposure and covariates (other risk factors) are in a linear ecologic model will be overesti-
ecologic bias if the net within-group bias is correlated within regions, the covariates will be mated, leading to overestimation of the rate
zero (see "Ecologic Bias") or if the risk factor confounders at the individual level and sub- ratio. Ltde is known about the impact in eco-
is ecologically uncorrelated with the exposure. stantial ecologic bias can occur even with logic studies of within-group error in measuring
One method to control for extraneous risk product terms in the model (81). continuous or multiple-category exposures.
factors in ecologic studies is to indude predic- When the data are not entirely ecologic Confounder Misclasifiation. In stud-
tor terms for these risk factors in the model (see "Ecologic Designs"), rate standardiza- ies conducted at the individual level, mis-
(e.g., the proportion of smokers or the mean tion is another method often employed to dasificaion of a confounder, if nondifferential
family income in each region). Unfortunately, adjust for extraneous risk factors in ecologic with respect to exposure and disease, will
even when such covariate data are available for studies. For example, if the age distribution usually reduce our ability to control for the
all regions, ecologic adjustment usually cannot is known for cases and for the base popula- confounder in the analysis (94,95). That
be expected to remove completely the ecologic tion in every region, we can mutually stan- is, adjustment will not completely elimi-
bias produced by these risk factors. In fact, it is dardize the rate in every region to the age nate the bias due to the confounder. In
possible for such ecologic adjustment to distribution of a well-defined (standard) pop- ecologic studies, however, nondifferential
increase bias (86). ulation (5); then we use the standardized misclassification of a binary confounder
The general conditions under which rates as the outcome variable in the ecologic within groups does not affect our ability to
the ecologic control for extraneous risk fac- analysis. Unfortunately, this method does control for that confounder (96). Thus, sur-

34 Environmental Health Perspectives Supplements


Volume 101, Supplement 4, December 1993
STJDY DESIGN

prisingly, nondifferential milsdassification of a approach needs more work to provide a confounders in the design phase and mea-
confounder is less problematic in ecologic reliable method of bias reduction. suring them accurately in the study popula-
studies, provided there is no ecologic bias, than tion. The prevention of selection bias,
in individual-level studies. Current Issues and however, is not so straightforward because
Collinearity. It is probably more com- Recommendations it depends on identifying all cases that
mon in ecologic studies than in other studies A general goal of epidemiologic research is occur in a well-defined (base) population at
for two or more predictors to be highly cor- to obtain the most information about pos- risk. When new cases occur infrequently
related across groups (63,97,98). This issue sible health effects with minimal and/or or when it is otherwise impractical to re-
is particularly relevant with environmental available resources. Given the difficulties examine enough individuals to detect all
factors, such as the associations between lev- in estimating effects of specific environ- new events, the prevention of selection bias
els of different contaminants in air or drink- mental exposures in human populations, this depends on population surveillance and moni-
ing water or associations between different goal is not easily obtained and optimal research toring systems, such as population-based
socioeconomic indicators. The implication strategies are not readily identified. Below, we tumor registries and industrial surveillance.
of such collinearities is that it is very diffi- higlight severl current methodologic issues Although these systems may be expensive
cult, perhaps impossible, to separate these in environmental epidemiology and make to implement and operate, they are often
effects statistically; analyses yield model some recommendations for future work. necessary to reduce the threat ofselection bias.
coefficients with very large variances and Study Design. No single design best Unfortunately, population-based sys-
often severely distorted estimates of effect. meets the objectives of every epidemiologic tems may not be sufficient to prevent selec-
Temporal Ambiguity of Cause and study. In practice, study objectives are tion bias with diseases for which detection
Effect. Use of incidence data in a cohort shaped by many factors-current knowl- depends critically on care seeking, symp-
study usually implies that disease occurrence edge, previous findings, institutional man- tom reporting, and complex differential
did not precede exposure to the hypothe- dates, societal values, personal preferences, diagnoses. A key problem is that not all
sized risk factor. Yet, in multiple-group or etc. Although a prospective cohort study persons with an illness recognize their
time-trend ecologic studies use of incidence might be expected, in general, to produce symptoms and seek medical attention.
data provides no such assurance against this less bias than would a hospital-based (pro- Thus, exposure effects observed for these dis-
temporal ambiguity (63). This inferential portional) case-control study, the latter eases in epidemiologic studies might reflect the
problem is most troublesome when it is pos- design might be a rational choice in certain effects of the exposure on illness behavior as
sible for disease to influence exposure status situations. Even an ecologic design, despite well as the effects on illness occurrence (100).
either at the individual level (see "Cohort its limitations, might be appropriate; it may Another solution to incomplete or
Study") or at the ecologic level (e.g., inter- be the only practical option at a given time. inadequate case detection is to control ana-
ventions designed to reduce exposure levels The challenges of environmental epi- lytically for methodologic covariates that
in areas with high rates of disease). demiology, therefore, cannot be solved reflect differences in illness behavior. For
The problem of temporal ambiguity in simply by advocating the use of certain, example, we might measure the individ-
ecologic studies is further complicated by more expensive study designs. In addition ual's tendency to seek medical care and
an unknown or variable latent period to committing more resources to the con- treat this variable as a confounder. The
between exposure and disease occurrence. duct of epidemiologic research, we need to measurement of this covariate should be
The investigator can only attempt to deal develop new designs to meet specific objec- independent of disease status; otherwise,
with this problem by establishing a specific tives more efficiently. For example, in covariate adjustment will probably lead to
lag period between observations of average "Case-Control Study," we discussed the bias toward the null value. This approach
exposure and disease rate. Even when the use of two-stage designs to investigate asso- needs further development and evaluation.
average latency is known, however, appro- ciations between rare diseases and rare An alternative strategy for studying dis-
priate data may not be available to accom- exposures and to control for covariates that eases that are difficult to detect in large pop-
modate the desired lag. are relatively expensive to measure. New ulations, such as musculoskeletal conditions,
Migration. Migration of individuals approaches are also needed to identify is another type of two-stage design. In the
into or out of the base population can cause intermediate variables in observational first stage, a large population is surveyed
selection bias in any type of epidemiologic designs, to evaluate interaction effects cross-sectionally or longitudinally by ques-
study, because migrants and nonmigrants (effect modification) more efficiently, and tionnaires or interviews to identify persons
may differ on both exposure prevalence and to deal with the problems of nonparticipa- with symptoms characteristic of the disease.
disease risk. Little is known about the mag- tion, nonresponse, and noncompliance. The second stage involves case-control sam-
nitude of this bias or how it can be reduced Another need in environmental epidemiol- pling of the population to compare persons
in ecologic studies, especially when studying ogy is to understand better the relationship with and without these symptoms (i.e., cases
diseases with long latent periods. One between acute biological changes and and controls). In this stage, subjects are
approach might be to use larger geographic chronic health effects. For example, we given more definitive diagnostic tests to
groups (e.g., states instead of counties as might combine experimental and observa- identify true cases of the disease. By com-
units of analysis) (99). Unfortunately, this tional methods to determine the extent to paring diagnostic test results between
approach is also likely to increase the poten- which short-term changes in pulmonary selected cases and controls, the investigators
tial for severe ecologic bias, because it function caused by exposure to air pollutants can assess the validity of their symptom-
makes the groups less homogeneous with lead to chronic respiratory disease (2). based criteria, suggest improvements in din-
respect to exposure (see "Ecologic Bias"). Bias Reduction. In nonrandomized ical diagnosis, and estimate exposure effects.
Another approach might be to incorporate studies, it is important for the investigator The latter objective requires the develop-
available data on the distributions of resi- to deal with confounding in the analysis. ment of statistical methods appropriate for
dential durations within regions, but this This is achieved by identifying potential the sampling strategy.

Environmental Health Perspectives Supplements 35


Volume 101, Supplement 4, December 1993
MORGENSTERN AND THOMAS

Quality ofMeasurement. As noted ear- problem amounts to possible misclassifica- environmental variables in genetic (e.g., link-
lier (see "Problems in Environmental tion bias, which is well understood and age) analyses of pedigree data. We also need
Epidemiology"), a major challenge in environ- often predictable. Yet, when the unit of to understand better the relationship between
mental epidemiology is to measure accurately analysis is the group, the resulting ecologic those parameters estimated in pedigree studies
each individual's exposure to suspected and bias is far less predictable and can be rela- and the effect parameters estimated in stan-
known risk factors for the disease under tively severe in magnitude, especially when dard epidemiologic studies; and we need to
study. In the absence of previously validated other sources of bias are present. Thus, in understand better how the estimates of
and inexpensive methods for measuring expo- general, ecologic analyses do not provide gene-environment interactions in pedigree
sures and covariates in large groups, it has very accurate estimates of effect. To make studies are biased by confounding, measure-
become common practice to use more than ecologic findings more informative, there- ment error, and family selection (ascertain-
one method or source of information to mea- fore, we need more theoretical work to ment). With this understanding, we can
sure these variables. Nevertheless, it usually is specify the conditions for which ecologic devise new methods to prevent or control
not dear how different methods or sources of estimates can be expected to be reasonably bias. Analogously, the use of family data in
information should be combined or what data valid. With this information, we might standard epidemiologic designs (e.g., history
should be combined to minimize measure- then collect additional data to check those of disease and/or its risk factors in relatives)
ment errors and estimation bias (4,101,102). key assumptions or to correct ecologic esti- requires further development in order to han-
We need more methodologic research in this mates. For example, by obtaining detailed die differences in family size and composition
area to provide guidelines for the measure- individual-level data on the exposure and among subjects. With the recent advances in
ment of specific exposures in particular types certain covariates in samples of selected molecular genetics, the integration of epi-
of populations. One approach that might be groups, we might be able to determine the demiology and population genetics is likely to
pursued with environmental exposures is to limits of ecologic bias in estimating the become more important in the future.
combine ecologic data with self-reported data exposure effect (see "Control for Covariates" Sample Size and Power. As noted in
on individual behaviors. For example, sup- and Prentice and Thomas, this issue). "Problems in Environmental Epidemiology,"
pose we collect ecologic data on pesticide Essentially, ecologic bias (aside from epidemiologic studies of environmental
spraying and distribution throughout a large within-group bias) occurs because group affili- exposures often require large sample sizes to
region. We could then obtain from subjects ation or the average exposure level of the detect risk-factor effects with sufficiently
the location of their homes; the type and loca- group affects disease occurrence indepen- small statistical error. To address this con-
tion of their work; their use of dfinking water; dently of exposure status at the individual cern, researchers are usually expected to jus-
and how often they swim, fish, and partici- level (see "Ecological Bias"). The structural tify their proposed sample size by estimating
pate in other activities that would affect effects of such ecologic variables, if they can be the power of their study for testing one or
their exposure to pesticides in the region. separated from other effects at the individual more major hypotheses (i.e., the probability
Frequently, an accurate method does level, might be informative, rather than just a of detecting an association of at least a cer-
exist for measuring an exposure, but the source of error. Thus, by induding both eco- tain magnitude with a designated Type I
application of this method to all subjects in a logic and individual-level predictors (possibly error-alpha level typically set at 5%). This
population is prohibitively expensive or of the same exposure) in the analysis, we is a rather straightforward procedure when
infeasible. In such cases, many investigators might enhance our understanding of disease the power estimation is applied to two
rely on less accurate methods for the total occurrence. This type of contextual or multi- dichotomous variables (exposure and disease)
sample and use the more accurate method in level analysis has been used extensively in (1,4,111). Yet all observational studies
a subsample of subjects. Assuming the accu- social science research (105-108) but rarely in require more complicated analyses to make
rate method is perfectly valid (i.e., the gold epidemiologic research (109). In addition, if causal inferences - e.g., to deal with polyto-
standard), the results of the validation sub- the effect of a risk factor is known from previ- mous, continuous, or time-dependent expo-
study are used to quantify the amount of ous research, the results of an ecologic analysis sures; covariate adjustment; the assessment of
measurement error, which is then used in the involving that risk factor could be used to interaction effects; matching; and other spe-
total sample to correct for misdassification evaluate the potential or realized impact of a cial design features. Although methods of
bias involving the imperfect measure of population intervention, which may not be power estimation do exist for many of these
exposure. Some important issues need to be completely estimable at the individual level complicating features, they require additional
considered to make this approach advanta- (63). A more profound understanding of specifications (assumptions) about which the
geous. First, how many subjects and what ecologic bias, therefore, could yield benefits to investigator is not likely to have adequate
proportion of the total sample should be other public-health research. information. Further development of these
included in the substudy (103,104)? Gene-Environment Interactions. Because methods would be useful, therefore, to iden-
Second, how should we correct for exposure both genetic and environmental factors con- tify techniques that are both practical and
misclassification in the analysis, especially tribute to the etiology of most diseases, we informative in specific situations, including
when the accurate method may not be per- would typically expect factors of each type to ecologic studies for which sample size
fectly valid or when the subsample is not rep- confound and/or modify the effect of the requirements have received little attention.
resentative of the total sample (see also other. We know, for example, that a combi- One parameter the investigator must
Prentice and Thomas, this issue)? nation of both environmental/personal factors specify to justify the proposed sample size
Ecologic Inference. Because of inherent and genetic susceptibilities are sufficient for is the magnitude of effect expected in the
problems of measurement, most epidemio- the development of certain diseases. Yet stan- data or the minimum effect regarded as
logic studies of environmental exposures dard methods of epidemiologic research and important to detect. In the absence of pre-
are at least partly ecologic. When all data, population genetics have not been well inte- vious epidemiologic studies involving simi-
except a single exposure, are obtained at grated (110). As indicated in "Ecological lar exposure levels, the expected effect is
the individual level, however, the ecologic Bias," we need new methods for incorporating generally specified rather arbitrarily (e.g., a

36 Environmental Health Perspectives Supplements


Volume 101, Supplement 4, December 1993
STUDY DESIGN

rate ratio of 2). Sometimes, however, there Data Analysis. Many of the recent covariates; lag periods between first expo-
are exposure-response findings from ani- developments and ideas for new study sure and disease detection; incomplete case
mal studies or occupational studies with designs and data collection that were dis- detection; and a limited sample size that
higher exposure levels, which could be used cussed in this article require parallel develop- severely restricts the number of covariates
to estimate the environmental exposure ments in statistical methods. For example, treated simultaneously. Several of these
effect expected in the base population. the analyst might have to deal with complex issues are covered further by Hatch and
This approach, which also requires further sampling strategies (as in two-stage designs); Thomas and by Prentice and Thomas in
development, might allow research funds missing, misclassified, and/or aggregated this issue. eX
to be allocated more judiciously. data on relevant variables; time-dependent

REFERENCES
1. Schlesselman JJ. Case-control studies: design, conduct, analysis. agents (Underhill DW, Radford EP, eds). Pittsburgh, PA; University
New York: Oxford University Press, 1982. of Pittsburgh, Center for Environmental Epidemiology, 1986; 9-16.
2. Kleinbaum DG, Kupper LL, Morgenstern H. Epidemiologic 25. Dosemeci M, Wacholder S, Lubin JH. Does nondifferential mis-
research: principles and quantitative methods. Belmont, CA: classification of exposure always bias a true effect toward the null
Lifetime Learning Publications, 1982. value? Am J Epidemiol 132:746-748 (1990).
3. Miettinen OS. Theoretical epidemiology: principles of occurrence 26. Susser M, Stein Z. Third variable analysis: application to causal
research in medicine. New York: John Wiley & Sons, 1985. sequences among nutrient intake, maternal weight, birthweight, pla-
4. KelseyJL, Thompson WD, Evans AS. Methods in observational epi- cental weight, and gestation. Stat Med 1:105-120 (1982).
demiology. New York: Oxford University Press, 1986. 27. Cook TD, Campbell DT. Quasi-experimentation: design and analy-
5. Rothman KJ. Modern epidemiology. Boston, MA: Little, Brown sis issues for field settings. Boston, MA: Houghton Muffin Co., 1979.
and Co., 1986. 28. Greenland S. Randomization, statistics, and causal inference.
6. Checkoway H, Pearce N, Crawford-Brown DJ. Research methods in Epidemiology 1:421-429 (1990).
occupational epidemiology. New York Oxford University Press, 1989. 29. Buck C, Donner A. The design of controlled experiments in the
7. Leaverton PE, ed. Environmental epidemiology. New York: evaluation of nontherapeutic interventions. J Chron Dis
Praiger, 1982. 35:531-538 (1982).
8. Chiazze L Jr, Lundin FE, Watkins D, eds. Methods and issues in 30. Cornfield J. Randomization by group: a formal analysis. Am J
occupational and environmental epidemiology. Ann Arbor, MI: Epidemiol 108:100-102 (1978).
Ann Arbor Science Publishers, 1983. 31. Donner A, Birkett N, Buck C. Randomization by cluster: sample
9. Goldsmith JR, ed. Environmental epidemiology: epidemiological size requirements and analysis. AmJ Epidemiol 114:906-914 (1981).
investigation of community environmental health problems. Boca 32. Zelen M. A new design for randomized clinical trials. N Engl J Med
Raton, FL: CRC Press, 1986. 300:1242-1245 (1979).
10. Kopfler FC, Craun GF, eds. Environmental epidemiology. Chelsea, 33. Ast DB, Schlesinger ER. The condusion of a 10-year study of water
MI: Lewis Publishers, 1986. fluoridation. Am J Public Health 46:265-271 (1956).
11. Poole C. Would vs should in the definition of secondary study base 34. Greenland S, Morgenstern H. Classification schemes for epidemio-
(letter). J Clin Epidemiol 43:1016-1017 (1990). logic research designs. J Clin Epidemiol 41:715-716 (1988).
12. Miettinen OS. The concept of secondary base (reply). J Clin 35. Greenland S. Response and follow-up bias in cohort studies. Am J
Epidemiol 43:1017-1020 (1990). Epidemiol 106:184-187 (1977).
13. Morgenstem H, Kleinbaum DG, Kupper LL. Measures ofdisease inci- 36. Robins J. A graphical approach to the identification and estimation
dence used in epidemiologic research. IntJ Epidemiol 9:97-104 (1980). of causal parameters in mortality studies with sustained exposure
14. Breslow NE, Day NE. Statistical methods in cancer research, vol. periods. J Chron Dis 40(Suppl 2):139S-161S (1987).
1. The analysis of case-control studies. Lyon: International Agency 37. Robins J. The control of confounding by intermediate variables.
for Research on Cancer, 1980; 50-51. Stat Med 8:679-701 (1989).
15. Rubin DB. Bayesian inference for causal effects: the role of ran- 38. Newman SC. Odds ratio estimation in a steady-state population. J
domization. Ann Stat 6:34-58 (1978). Clin Epidemiol 41:59-65 (1988).
16 Greenland S, Robins JM. Identifiability, exchangeability, and epi- 39. Miettinen OS. The case-control study: valid selection of subjects. J
demiological confounding. Int J Epidemiol 15:412-418 (1986). Chron Dis 38:543-548 (1985).
17. Greenland S, SchlesselmanJJ, Criqui MH. The fallacy of employing 40. Greenland S, Morgenstern H. Matching and efficiency in cohort
standardized regression coefficients and correlations as measures of studies. Am J Epidemiol 131:151-159 (1990).
effect. Am J Epidemiol 123:203-208 (1986). 41. Thompson WD, Kelsey JL, Walter SD. Cost and efficiency in the
18. Greenland S, Maclure M, Schlesselman JJ, Poole C, Morgenstern choice of matched and unmatched case-control study designs. Am
H. Standardized regression coefficients: a further critique and review J Epidemiol 116:840-851 (1982).
of some alternatives. Epidemiology (in press). 42. Greenland S. Adjustment of risk ratios in case-base studies (hybrid
19. Greenland S, Thomas DC. On the need for the rare disease assump- epidemiologic designs). Stat Med 5:579-584 (1986).
tion in case-control studies. Am J Epidemiol 116:547-553 (1982). 43. Mantel N. Synthetic retrospective studies and related topics.
20. Greenland S, Thomas DC, Morgenstern H. The rare-disease Biometrics 29:479-486 (1973).
assumption revisited: a critique of estimators of relative risk for 44. Kupperl, McMichaelAJ, Spirtas R Ahybridepidemiologicstudydesign
case-control studies. Am J Epidemiol 124:869-876 (1986). usefil in estimating relative risk. JAm Stat Assoc 70:524-528 (1975).
21. Armenian HK, Lilienfeld AM. Incubation period of disease. 45. Prentice RL. A case-cohort design for epidemiologic cohort studies
Epidemiol Rev 5:1-15 (1983). and disease prevention trials. Biometrika 73:1-11 (1986).
22. Thomas DC. Pitfalls in the analysis of exposure-time-response rela- 46. Miettinen OS, Wang JD. An alternative to the proportionate mor-
tionships. J Chron Dis 40 (Suppl 2):71S-78S (1987). talityratio. AmJEpidemiol 114:144-148(1981).
23. Shy CM, Kleinbaum DG, Morgenstern H. The effect of misclassi- 47. Wang JD, Miettinen OS. The mortality odds ratio (MOR) in occu-
fication of exposure status in epidemiological studies of air pollution pational mortality studies-selection of reference occupation(s) and ref-
health effects. Bull N YAcad Med 54:1155-1165 (1978). erence cause(s) ofdeath. Ann Acad Med 13(Suppl 2):312-316 (1984).
24. Fleiss JL. Statistical factors in early detection of health effects. In: 48. Butler WJ, Park RM. Use ofthe logistc re ion model for the analysis
New and sensitive indicators of health impacts of environmental of proportionate mortaliydata AmJEpiemioll25:515-523 (1987).

Environmental Health Perspectives Supplements 37


Volume 101, Supplement 4, December 1993
MORGENSTERN AND THOMAS

49. Robins JM, Blevins D. Analysis of proportionate mortality data using 81. Greenland S. Divergent biases in ecologic and individual-level stud-
logistic regression models. Am J Epidemiol 125:524-535 (1987). ies. Stat Med 11:1209-1223 (1992).
50. Greenland S, Neutra R. An analysis of detection bias and proposed 82. Selvin HC. Durkheim's suicide and problems of empirical research.
corrections in the study of estrogens and endometrial cancer. J Chron Am J Sociol 63:607-619 (1958).
Dis 34:433-438 (1981). 83. Langbein LI, Lichtman AJ. Ecological inference, series 07-010.
51. Feinstein AR. Methodologic problems and standards in case-control Beverly Hills, CA: Sage Publications, 1978.
research. J Chron Dis 32:35-41 (1979). 84. Firebaugh G. A rule for inferring individual-level relationships from
52. White JE. A two-stage design for the study of the relationship between aggregate data. Am Sociol Rev 43:557-572 (1978).
arareexposure and araredisease. AmJ Epidemiol 115:119-128 (1982). 85. Richardson S, Stucker I, Hemon D. Comparisons of relative risks
53. Walker AM. Anamorphic analysis: sampling and estimation for obtained in ecological and individual studies: some methodological
covariate effects when both exposure and disease are known. considerations. Int J Epidemiol 16:111-120 (1987).
Biometrics 38:1025-1032 (1982). 86. Greenland S, Morgenstern H. Ecological bias, confounding, and
54. Cain K, Breslow NE. Logistic regression analysis and efficient design effect modification. Int J Epidemiol 18:269-274 (1989).
for two-stage studies. Am J Epidemiol 128:1198-1206 (1988). 87. Greenland S, Morgenstern H. Neither within-region nor cross-
55. Breslow NE, Zhao LP. Logistic regression for stratified case-con- regional independence of exposure and covariates prevents ecologi-
trol studies. Biometrics 44:891-899 (1988). cal bias (letter). IntJ Epidemiol 20:816-817 (1991).
56. Flanders WD, Greenland S. Analytic methods for two-stage cs-con- 88. Robinson WS. Ecological correlations and the behavior of individ-
trol studies and other stratified designs. Stat Med 10:739-747 (1991). uals. Am Sociol Rev 15:351-57 (1950).
57. Louis TA, Lavori PW, Bailar JC III, Polansky M. Crossover and self- 89. Duncan OD, Cuzzort RP, Duncan B. Statistical geography: prob-
controlled designs in dinical research. N EnglJ Med 310:24-31 (1984). lems in analyzing areal data. Westport, CT: Greenwood Press, 1961.
58. Madure M. The case-crossover design: a method for study transient 90. Greenland S, Morgenstern H. Ecological bias and confounding
effects on the risk of acute events. AmJ Epidemiol 133:144-153(1991). (reply). Int J Epidemiol 19:766-767 (1990).
59. Caporaso NE, Hayes RB, Dosemeci M, Hoover R, Ayesh R, Hetzel 91. Richardson S, Hemon D. Ecological bias and confounding (letter).
M, Idle J. Lung cancer risk, occupational exposure, and the debriso- Int J Epidemiol 19:764-766 (1990).
quine metabolic phenotype. Cancer Res 49:3675-3679 (1989). 92. Rosenbaum PR, Rubin DB. Difficulties with regression analyses of
60. Mack W, Langholz B, Thomas DC. Survival models for familial age-adjusted rates. Biometrics 40:437-443 (1984).
aggregation of cancer. Environ Health Perspect 87:27-35 (1990). 93. Brenner H, Savitz DA, Jockel KH, Greenland S. The effects of non-
61. Susser E, Susser M. Familial aggregation studies: a note on their epi- differential exposure misclassification in ecological studies. Am J
demiologic properties. Am J Epidemiol 129:23-30 (1989). Epidemiol 135:85-95 (1992).
62. Claus EB, Risch NJ, Thompson WD. Age at onset as an indicator of 94. Greenland S. The effect ofmisdassification in the presence of covari-
familial risk of breast cancer. Am J Epidemiol 131:961-972 (1990). ates. Am J Epidemiol 112:564-569 (1990).
63. Morgenstern H. Uses of ecologic analysis in epidemiologic research. 95. Savitz DA, Baron AE. Estimating and correcting for confounder
AmJ Public Health 72:1336-1344 (1982). misclassification. Am J Epidemiol 129:1062-1071 (1989).
64. Ohno Y, Aoki K, Aoki N. A test of significance for geographic clus- 96. Brenner H, Savitz DA, Greenland S. The effects of nondifferential
ters of disease. Int J Epidemiol 8:273-281 (1979). confounder misclassification in ecologic studies. Epidemiology
65. Mollie A, Richardson S. Empirical Bayes estimates of cancer mor- 3:456-459 (1992).
tality rates using spatial models. Stat Med 10:95-112 (1991). 97. Stavraky KM. The role of ecologic analysis in studies of the etiol-
66. Mahoney MC, Labrie DS, Nascam PC, Wolfgang PE, Burnett WS. ogy of disease: a discussion with reference to large bowel cancer. J
Population density and cancer mortality differentials in New York Chron Dis 29:435-444 (1976).
State, 1978-1982. Int J Epidemiol 19:483-490 (1990). 98. Connor MJ, Gillings D. An empiric study of ecological inference.
67. Lee JAH, Petersen GR, Stevens RG, Vesanen K The influence of Am J Public Health 74:555-559 (1984).
age, year of birth, and date on mortality from malignant melanoma 99. Polissar L. The effect of migration on comparison of disease rates in
in the populations of England and Wales, Canada, and the white geographic studies in the United States. Am J Epidemiol
population ofthe United States. AmJ Epidemiol 110:734-739 (1979). 111:175-182 (1980).
68. Lee JAH. Melanoma and exposure to sunlight. Epidemiol Rev 100. Morgenstern H, Horwitz SM, Berkman LF. Connections between
4:110-136 (1982). epidemiology and health services research: a review of psychosocial
69. Wallenstein S, Gould MS, Kleinman M. Use of the scan statistic to effects on childhood morbidity and pediatric medical care use. J
detect time-space clustering. Am J Epidemiol 130:1057-1064 (1989). Ambulatory Care Management 9:33-45 (1986).
70. Roberson PK Controlling for time-varying population distributions in 101. Walter, SD, and Irwig, LM. Estimation of test error rates, disease
disease dustering studies. AmJ Epidemiol 132:S131-S135 (1990). prevalence and relative risk from misclassified data: a review. J Clin
71. Hatch M, Susser M. Backround gamma radiation and childhood Epidemiol 41:923-937 (1988).
cancers within 10 miles of a U.S. nuclear plant. Int J Epidemiol 102. Marshall RJ. Validation study methods for estimating exposure pro-
19:546-552 (1990). portions and odds ratios with misclassified data. J Clin Epidemiol
72. Ostrom CWJr. Time series analysis: regression techniques, 2nd ed., 43:941-947 (1990).
quantitative applications in the social sciences, 07-009. Newbury 103. Greenland S. Statistical uncertainty due to misclassification: implica-
Park, CA: Sage Publications, 1990. tions for validation substudies. J Clin Epidemiol 41:1167-1174 (1988).
73. McDowall D, McCleary R, Meidinger EE, Hay RA Jr. Interrupted 104. Spiegelman D, Gray R. Cost-efficient study designs for binary
time series analysis. Quantitative applications in the social sciences, response data with Gaussian covariate measurement error. Biometrics
07-021. Newbury Park, CA: Sage Publications, 1980. 47:851-870 (1991).
74. Sayrs LW. Pooled time series analysis. Quantitative applications in the 105. Boyd LH Jr, Gudmund RI. Contextual analysis: concepts and sta-
social sciences, 07470. Newbury Park, CA: Sage Publications, 1989. tistical techniques. Belmont, CA; Wadsworth Publishing Co., 1979.
75. Catalano R, Serxner S. Time series designs of potential interest to 106. Lincoln JR, Zeitz G. Organizasional properties from aggre data sepa-
epidemiologists. Am J Epidemiol 126:724-731 (1987). rating individual and structural effects. Am Sociol Rev 4-5391-408 (1980).
76. Gruchow HW, Rimm AA, Hoffmann RG. Alcohol consumption 107. Aitkin M, Longford N. Statistical modeling issues in school effec-
and ischemic heart disease mortality: Are time-series correlations tiveness studies. J Roy Statist Soc (Series A) 149(Part 1): 1-43 (1986).
meaningful? Am J Epidemiol 118:641-650 (1983). 108. Iversen GR Contextual analysis. Quantitative applications in the social
77. Darby SC, Doll R Fallout, radiation doses near Dounreay, and child- sciences, 07-081. Newbury Park, CA; Sage Pu ications, 1991.
hood leukaemia. Br Med J 294:603-607 (1987). 109. Humphreys K, Carr-Hill R. Area variations in health outcomes:
78. Crawford MD, Gardner MJ, Morris JN. Changes in water hard- artefact or ecology. IntJ Epidemiol 20:251-258 (1991).
ness and local death-rates. Lancet 2:327-329 (1971). 110. Khoury MJ, Beaty TH, Flanders WD. Epidemiologic approaches
79. Goodman LA. Some alternatives to ecological correlation. Am J to the use of DNA markers in the search for disease susceptibility
Sociol 64:610-625 (1959). genes. Epidemiol Rev 12:41-55 (1990).
80. Beral V, Chilvers C, Fraser P. On the estimation of relative risk from vital 111. Morgenstern H, Winn DM. A method for determining the sam-
statistical data. J Epidemiol Community Health 33:159-162 (1979). pling ratio in epidemiologic studies. Stat Med 2:387-396 (1983).

38 Environmental Health Perspectives Supplements


Volume 101, Supplement 4, December 1993

You might also like