Evaluation (Not Validation) of Quantitative Models: Oreskes

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

llioml iiiniii

Evaluation (Not Validation) of the environment thus enjoy the benefit of


widespread agreement about the basic
Quantitative Models harmfulness of the substance being regu-
lated. (This is not to say that the consensus
Naomi Oreskes* was not hardwon: In the 1920s and 1930s,
most health professionals opposed banning
Gallatin School of Individualized Study, New York University, lead in gasoline [1,2]).
New York, New York The political and scientific consensus
on the harmfulness of lead stands in
The present regulatory climate has led to increasing demands for scientists to attest to the contrast to other recent debates in environ-
predictive reliability of numerical simulation models used to help set public policy, a process mental health and safety-nuclear power,
frequently referred to as model validation. But while model validation may reveal useful polyvinyl chloride, radon gas, to name a
information, this paper argues that it is not possible to demonstrate the predictive reliability of any few-in which there have been heated and
model of a complex natural system in advance of its actual use. All models embed uncertainties, even bitter disagreements among govern-
and these uncertainties can and frequently do undermine predictive reliability. In the case of lead ment agencies, industrial organizations,
in the environment, we may categorize model uncertainties as theoretical, empirical,
parametrical, and temporal. Theoretical uncertainties are aspects of the system that are not fully labor unions, and citizens' groups as to the
understood, such as the biokinetic pathways of lead metabolism. Empirical uncertainties are significance of the purported harms (6). In
aspects of the system that are difficult (or impossible) to measure, such as actual lead ingestion these cases, debates have arisen in part
by an individual child. Parametrical uncertainties arise when complexities in the system are because of the difficulty of documenting
simplified to provide manageable model input, such as representing longitudinal lead exposure by exposure levels (thus proving harm) in
cross-sectional measurements. Temporal uncertainties arise from the assumption that systems nonoccupational settings. Such settings
are stable in time. A model may also be conceptually flawed. The Ptolemaic system of astronomy typically involve low-level exposures whose
is a historical example of a model that was empirically adequate but based on a wrong clinical effects may be difficult to discern
conceptualization. Yet had it been computerized-and had the word then existed-its users and characteristically emerge only after
would have had every right to call it validated. Thus, rather than talking about strategies for considerable time. In addition, the harmful
validation, we should be talking about means of evaluation. That is not to say that language alone materials may not themselves reside in the
will solve our problems or that the problems of model evaluation are primarily linguistic. The body and therefore cannot be directly mea-
uncertainties inherent in large, complex models will not go away simply because we change the sured. Under such circumstances, scientific
way we talk about them. But this is precisely the point: calling a model validated does not make it uncertainty is inevitable. Low-level radia-
valid. Modelers and policymakers must continue to work toward finding effective ways to tion is a case in point. Because radiation
evaluate and judge the quality of their models, and to develop appropriate terminology to does not reside in the bloodstream, it is dif-
communicate these judgments to the public whose health and safety may be at stake. Environ ficult to document exposures in uncon-
Health Perspect 106(Suppl 6):1453-1460 (1998). http://ehpnet1.niehs.nih.gov/docs/1998/ trolled settings, and impossible to prove
Suppl-6/1453-1460oreskes/abstract.html that low-level exposure caused a particular
Key words: model evaluation, model validation, quantitative models affliction in a particular individual. Such
proofs must rely on statistical regularities
in longitudinal studies of populations. In
contrast, it is relatively easy to document
Long experience has taught me that well-documented afflictions, not least of who has been affected by lead: blood lead
with regard to intellectual matters, this which is the retardation of brain develop- levels are measurable and the clinical effects
is the status of mankind: the less people ment in infants and children. Thus in the of toxicity are readily discernible (7-11).
know and understand about such matters, 1 970s, the U.S. government began to take In principle, therefore, it should be a com-
the more positively they attempt to steps to decrease human exposure to ambi- paratively straightforward task to set legal
reason about them. ,** limits for lead in the environment.
-Galileo ent lead, most significantly by banning the
use of lead additives in gasoline (1-4). In practice, however, the problem of
About lead in the environment, this much is Similar actions have been taken in other setting regulatory standards for lead has
certain: lead is bad. Human ingestion of countries (5). Scientists working on the been complicated by the growing recog-
lead is associated with a number of clinically problem of assessing and regulating lead in nition that very low levels of lead exposure
may not be safe as previously assumed
(2,3,12-14). The problem of lead in the
This paper is based on a presentation at the Workshop on Model Validation Concepts and Their Application to
environment thus increasingly resembles
Lead Models held 21-23 October 1996 in Chapel Hill, North Carolina. Manuscript received at EHP 16 January other environmental health debates: the
1998; accepted 13 May 1998. effects of low-level exposure-diminished
am grateful to L. Small for help interpreting the Ohio vs. EPA decision, and to R. Elias and A. Marcus for school performance, attention disorders-
inviting my contribution to this volume.
*Present affiliation: History Department and Program in Science Studies, University of California-San Diego, may not be readily discernible and are
La Jolla, CA. difficult to diagnose. Even if accurately
Address correspondence to N. Oreskes, History Dept. 0104, U.C. San Diego, 9500 Gilman Dr., La Jolla, CA diagnosed, there is currently no safe medical
92093-4695. Telephone: (619) 534-4695. Fax: (619) 534-7283. E-mail: noreskes@ucsd.edu
Abbreviations used: PVC, polyvinyl chloride; IEUBK model, integrated exposure uptake biokinetic model; treatment for low-level lead toxicity, and its
U.S. EPA, U.S. Environmental Protection Agency. most worrisome effects are irreversible. By

Environmental Health Perspectives * Vol 106, Supplement 6 * December 1998 1 453


N. ORESKES

the time a child is diagnosed with lead the IEUBK model. Scientists involved in reliability of a computer model. Hodges
poisoning, exposure has occurred and dam- the construction and use of the IEUBK and Dewar (29), in a report for the RAND
age has been done. Thus there is a com- model wanted to discuss what it means-or Corporation on computer models used by
pelling need to understand the effects of should mean-to call their model valid or the military to evaluate the efficacy of
low-level lead exposure in order to prevent to speak of its valid application (22). The weapons systems in battlefield scenarios,
lead poisoning. Toward this end, scientists title of the workshop presupposed both the make the distinction between two kinds of
have turned to numerical simulation models. necessity and the possibility of validating models: those that can be validated and
Scientists at the U.S. Environmental the IEUBK model, but organizers were also those that cannot. To be validatable, in
Protection Agency (U.S. EPA) have been concerned with the question of whether their words, the situation being modeled
charged with the task of determining the one can validate a numerical simulation must satisfy four criteria: a) it must be
relationship between environmental lead model at all, i.e, whether one can demon- observable and measurable; b) it must
exposure and adverse health effects, with strate that a model is reliable in advance of exhibit constancy of structure in time; c) it
the goal of setting appropriate regulatory its use. Also at stake was the question of must exhibit constancy across variations in
standards for lead in air, soil, and water in how the language of validation, that is, how conditions not specified in model; and d) it
the United States. To address the numer- we talk about what we do, affects both the must permit the collection of ample data.
ous variables involved, the U.S. EPA has process itself and our perception of it. Models in social and policy sciences
developed the integrated exposure uptake The purpose of this paper, emerging generally fail to satisfy these criteria and
biokinetic (IEUBK) model, a software from that workshop, is to review the prob- therefore cannot be validated; that is, their
package consisting of several, linked com- lem of uncertainty in the information reliability as a basis for prediction cannot be
puter programs that relate environmental obtained from complex models of natural demonstrated. Because the systems are
lead exposure to blood lead levels in chil- systems in the context of the regulatory incompletely known and may change with
dren (15-18). Model input consists of data environment. This paper does not seek to time, a model that works well under one set
on environmental lead exposures estimated offer specific recommendations on how to of circumstances may fail under a different
by cross-sectional measurement of lead in develop quantitative measures of uncer- set of circumstances (29). In essence, such
air, soil, and water in children's homes. tainty in any particular model. Such rec- models are trying to "predict the unpre-
The data are fed into a biokinetic model ommendations are best left to modelers dictable" (30). Bankes (30), also writing for
that simulates the metabolism of lead in themselves, and several recent papers offer RAND, concludes that the use of computer
the children's bodies, and from this, esti- such recommendations (23-28). More- models for prediction in policy analysis is
mates likely blood lead levels. In principle, over, the notion of uncertainty quantifica- not only generally misleading but poten-
the IEUBK model should be a powerful tion itself requires qualification. There are tially dangerous, and in the case of battle-
tool to help set nationwide regulatory stan- many sources of uncertainty in numerical field scenarios, literally so. When used for
dards, to identify communities in which models. Commonly, only a few are easily prediction, these models provide only the
current ambient lead levels are cause for quantified, many or most are quantified illusion of certainty. At best, the result is a
concern, and to assess the likely impact of only with difficulty, and several may be not false sense of security, at worse, a dangerous
possible remedial actions in particular situ- be quantifiable at all. If a model is concep- hubris. Bankes advises that policy models
ations. In short, the goal of the IEUBK tually flawed, quantification of input should be used primarily in an explanatory
model is to prevent lead poisoning among uncertainty will not make the model reli- mode, to explore the range and possible
American children, a goal that no right- able. On the contrary, quantification may consequences of policy options, including
minded person would dispute. But how do surround such a model with an aura of worst-case scenarios. He notes that this nor-
we know if the model is a good one? The credibility that it does not deserve. Yet the mally requires the development of multiple
demands of good science and the demands demand for credibility is real enough. The models. Models sometimes produce results
of democracy require evidence that the current regulatory climate has led to a situ- that surprise their creators, and in doing so
model is reliable (19-21). ation in which scientists frequently feel elucidate unknown implications of known
Much of this demand has been pressed to argue the strength of their mod- information and overt implications of
expressed in terms of the need for model els, often beyond the degree to which they covert assumptions. Nonpredictive models
validation. As computer models are being feel entirely comfortable. It is one thing to can be informative but only as long as they
used increasingly by federal, state, and local ask that scientists discuss the pros and cons are used in question-driven rather than
governments as a basis for policy decisions, of a model but quite another to demand answer-seeking frameworks (30,31).
there has been a concomitant demand for that they declare the model valid. Apart The RAND authors restrict their
scientific agencies to attest to the legitimacy from the internal demands of the scientific arguments to models in the policy domain
and reliability of these models, and to community, the push for model validation and suggest that their caveats do not apply
ensure that claims made on behalf of mod- is a response to the political exigencies of to the hard sciences in which model pre-
els are defensible. The safe level of lead our times. How should scientists, in the dictions can be experimentally verified. But
exposure is a scientiflc question, but it capacity of scientists, respond? this is an arguable point; many of the diffi-
comes to the fore within a social and politi- culties encountered in the social world also
cal context. It was in this context that the Working from a False Pretense: apply in the physical world. Oreskes et al.
U.S. EPA National Center for Environ- The Notion of aValidatable Model and Oreskes (32,33), in a discussion of
mental Assessment organized the October In recent years, scientists in various computer models in the earth sciences,
1996 workshop titled "Lead Model disciplines have developed the notion of note that the criteria outlined above-mea-
Validation" to explore possible responses to model validation to refer to the process by surability, accessibility, and temporal and
the demand for evidence of the reliability of which scientists attempt to demonstrate the spatial invariance-are precisely those

1454 Environmental Health Perspectives * Vol 106, Supplement 6 * December 1998


EVALUATION (NOT VALIDATION) OF MODELS

features typically lacking in the natural measurement of household dust is used as a Limits of Prediction
systems that scientists are increasingly surrogate for ingestion level, but children in The RAND authors cited above assume
exploring with computer models. The the same household will have different that models in the natural sciences can be
reason is evident: If a physical situation levels of ingestion due to different patterns validated because their predictions may be
fully satisfied these criteria, there would be of behavior.) The input variables may also tested by observation in the natural world.
little need for a numerical simulation. It change with time and with the seasons, e.g., In making this claim, they are implicitly
could be described, in most cases, with a if a child spends more or less time out of invoking the hypothetico-deductive model
small number of deterministic equations. doors; as the child grows up and changes of science, namely, that scientific theories
Computer models are needed and have his habits or begins to attend school; or can be thought of as statements that entail
become increasingly common in the nat- unpredictably, if the family moves or has a logically necessary deductive consequences:
ural sciences precisely because scientists are change in its economic or childcare situa- predictions. If the predictions of a theory
grappling with complex systems involving tion. Short-duration sampling of lead levels come true, then we have warrant for faith
multiple interacting variables that are diffi- in a child's environment provides only an in that theory. But this focus on prediction
cult to access, hard to measure, and may estimate of actual lead exposure, and this, may be misplaced.
change in space or time. Furthermore, the in turn, delimits only the range of possi- A fundamental problem with the
interrelationships between these variables bilities for actual lead uptake. Further- hypothetico-deductive model, as many
may be indeterminate or at least not more, the meaning of these variables may philosophers have realized, is that it
yet determined. not be invariant. There is some evidence assumes closed systems. A statement of the
There are, of course, computer models that the same exposure levels may produce form p entails q works if and only if the
that predict singular deterministic events in different effects in different people, per- statement describes a closed system. But a
the natural world. Celestial mechanics pro- haps because of inborn or developmental closed system is a philosophical ideal, not a
vides an example: computer models are contrasts in susceptibilities, nutrition, or natural kind. Real-life systems are never
commonly used to predict the positions of synergistic effects with other elements in closed, and experimental tests inevitably
celestial bodies. As the recent collision of the environment (34). embed hidden assumptions (32,35,36).
Comet Shoemaker-Levy with Jupiter We model systems like these precisely Because these embedded assumptions may
shows, models in this field are very success- because of their complex nature, as a be faulty, a true theory may fail its experi-
ful. The location and timing of this colli- means for grappling with complex vari- mental test. A famous example of this is
sion was predicted to a high degree of ables, and toward the important social found in the history of astronomy.
accuracy more than a year in advance. One goal of preventing future cases of lead Scientists in the 16th century suggested
might thus claim that such models can be toxicity. But in the process of constructing that if the earth orbited the sun, as
validated by reference to actual events-and the model, we embed uncertainty, and, as Copernicus proposed, then the angular
have been. But models in celestial mechan- the examples given above indicate, only position of a given star would change dur-
ics represent relatively simple physical sys- some of this uncertainty can be estimated, ing the course of the year as the earth
tems in which the operative forces can be much less directly measured. The issue of moved through its orbit. But when
described by a small number of determinis- inborn susceptibility differentials, for astronomers looked for this stellar parallax,
tic equations, and in which the variables example, is very poorly understood. Future they found none-and they rejected the
(e.g., the mass of Jupiter) are measurable research may lead to a better understand- Copernican theory (37,38). Implicitly,
constants. Indeed, they are the exception ing of why different individuals react dif- they were assuming that the earth's orbit
that proves the rule because people have ferently to the same exposure, but for was large relative to the distance of the
been predicting the positions of the celestial now, uncertainty remains. stars and that their telescopes were power-
objects for millennia, long before the How does this embedded uncertainty ful enough to detect the changes that
advent of digital computers. Computer affect the predictive reliability of the occurred. Both these assumptions turned
models in celestial mechanics are a matter model? That is a question that cannot be out to be very wrong!
of convenience, not necessity. established a priori. It can be established In the case of Copernican astronomy,
Most models in the natural sciences are only through the actual use of the model. scientists rejected a theory that turned out
different. They involve data that are indeed And this is why models of complex sys- to be true, but what about the reverse?
variable and difficult to measure. Consider tems, whether in the social or the physical Have scientists ever accepted a theory on
lead in the environment. Lead exposure and and biologic sciences, cannot be validated the basis of successful predictions but later
uptake may depend on lead concentrations in the sense that the RAND authors imply. discovered that the theory was false?
in soil, water, air, and household dust; the There is no way to demonstrate the predic- To be sure. The alternative to Copernican
size and quantity of lead paint chips in a tive reliability of such models. To imply theory-the Ptolemaic system-was con-
household; the amount of soil or number of otherwise by using the language of valida- firmed by reams of observational evidence
paint chips that a child eats; the amount of tion is misleading. But if we cannot and scores of successful predictions of plan-
time a child spends outdoors; whether she demonstrate the predictive reliability of the etary events (37). Scientists in the 16th
washes her hands before eating and, if so, model in advance, then how should we be century had grounds for accepting the
for how long she scrubs; and so on. Each of evaluating the merits and demerits of a Ptolemaic system. Had it been computer-
these variables is difficult to quantify. complex numerical simulation model? One ized, its makers would have had every rea-
Indeed, if one could quantify by monitor- step in the process may be to realize that son to call it validated (assuming that word
ing the amount of paint a child ate, one prediction is not as important as it is often had then existed). Yet, as we all know, the
would be morally compelled to intervene to thought to be, for predictive power is itself Ptolemaic system was fundamentally
prevent further ingestion. (In practice, a fallible judge of scientific knowledge. wrong. It was wrong not because it failed

Environmental Health Perspectives a Vol 106, Supplement 6 * December 1998 1 455


N. ORESKES

its predictive tests but because the basic find precisely that definition: "Valid and ordered the agency "to test and validate
conceptualization of the universe that implies being supported by objective truth" the model as an adequate forecasting tech-
supported it was faulty. (45). The disclaimer that scientists know nique" (53). A notable feature of this case is
In light of historical examples like this what they mean when they talk about vali- that the utility companies that owned the
one, the philosopher of science Karl dation would work if the models under plants were effectively shielded from liabil-
Popper famously argued that no scientific discussion were being used solely within ity for the pollution that their plants had
theory or model can ever be proved right, the confines of the relevant scientific com- caused because it was the U.S. EPA, not
only wrong (39,40). If our observations munities. But very often they are not. they, that had set the emission limits.
are inconsistent with theoretical predic- Numerical simulation models are increas- One could, of course, read this decision
tion, then we know something is amiss, ingly being used, often commissioned, by as implying that had the U.S. EPA vali-
but if our observations satisfy theoretical public agencies whose constituents are not dated the model, then the agency would
prediction, all we know is that the theory privy to local scientific consensus. have been blameless despite the model's
has not yet been proven wrong. Whether Furthermore, individual scientists may predictive failure. After all, the action of
the theory will continue to work in the claim that model validation does not imply the court was to order the U.S. EPA to val-
future is an open question. The longer a an assertion about reality (47), but the idate the model! From this perspective, the
theory has been around and the more exper- official pronouncements of the regulatory more restricted notion of validation might
imental tests it has passed, the more likely it agencies for whom they work frequently at first sight appear adequate for regulatory
seems that the theory is right but only in a belie this claim. The Department of purposes. But this is clearly not quite what
probabilistic, not a deterministic, sense. Energy, for example, has defined valida- the court intended. In the words of the
Scientists, of course, know this at least tion as the determination that a "model decision, "In order to be useful, a model
implicitly, and many modelers will argue indeed reflects the behavior of the real must accurately predict the 'behavior' of
that when they use the word validation world" (48). The International Atomic the.. .system being modeled." The argu-
they do not mean to imply that their Energy Agency (49) has defined a vali- ment of the petitioners against U.S. EPA
model is literally true. They simply mean dated model as one that provides a "good was that "the model's predictions are not
that it is not evidently false. The modelers representation of the actual processes occur- accurate..." (53). In fact, the U.S. EPA
have gone through a series of exercises to ring in a real system." (The use of the word had validated the computer model: it had
show that there are no major defects in the "actual" by the European agency is telling. compared model output to empirical out-
model and that they have done their "level In the 19th century, the French word comes at four other sites. What the U.S.
best" (41). Validation, in this view, is a actual was borrowed by both English and EPA had not done was test the model at
process of confidence building, of building German scientists as a synonym for real and the particular site and subsequently moni-
a case for the model (25,42,43). A vali- observable.) Protestations of scientists tor the emissions. The court recognized
dated model, therefore, although not true notwithstanding, it is evident why these that testing and monitoring at every site
strictly speaking, may be provisionally regulatory agencies make these claims: may not be practical-indeed, this is a pri-
accepted (44). These are reasonable claims, Were they to describe validation only as a mary reason for constructing simulation
hardly likely to provoke profound epis- process of checking for gross error, it would models in the first place-but it remains an
temic discontent, and they are certainly be inadequate as a basis from which to open question as to how much site-specific
consistent with the first dictionary defini- forge political consensus (50). testing and monitoring is required to sat-
tion of the word valid: without obvious A recent court case underscores this isfy legal and community standards. In this
flaws or defects (45). From this definition, point. In 1986, the U.S. EPA was sued for regard, scientists have an important role to
validation should simply imply the process failing to demonstrate the accuracy of a play in openly discussing the problems and
in which obvious flaws are corrected. computer model used to set emissions lim- trade-offs involved.
But although these claims are its under the Clean Air Act for two electric Regulation and legal liability are not
reasonable, they are also problematic. One power plants in the state of Ohio. The the only issues at stake here, nor are they,
may remove obvious errors in a model question at stake was how much pollution from a scientific and moral perspective, the
while more subtle ones remain. If valida- could be emitted from the power plants most important ones. It may be possible to
tion were merely the process of removing without causing local air pollution levels to satisfy the legal standard of acting in a man-
obvious defects, this would scarcely be suf- exceed federal standards, and the U.S. EPA ner that is not arbitrary but fail to satisfy
ficient for regulatory purposes. Regulatory had used a computer model to determine the scientific standard of producing reliable
agencies and the public seek assurance not the answer. But the model was not predic- knowledge. Ultimately, the purpose of air
merely that a model is free of gross error tively reliable. The resulting pollution levels pollution controls is to safeguard human
but that it provides a reliable basis for deci- violated the Clean Air Act, and the state health and property and preserve ecosys-
sion making (19,20,46). But to imply that government of Ohio took the U.S. EPA to tems. The purpose of the IEUBK model is
the model provides a reliable basis for deci- court. The Sixth Circuit of the U.S. Court to prevent new cases of lead poisoning.
sion making is to imply that the model of Appeals ruled in favor of Ohio, finding From this perspective, the issue is not
provides an accurate and substantially against the U.S. EPA because it had used whether the courts will be content with
complete representation of the natural the computer model "without adequately good-faith efforts, the issue is whether the
world. This, of course, is how people out- validating, monitoring, or testing its relia- model gives accurate results. In issues of
side the modeling community interpret bility" (51,52). The U.S. EPA, the court public health and safety, we all have a stake
validation. In common usage, valid is concluded, acted arbitrarily in failing to in knowing that decisions made upon the
taken as synonymous with correct, i.e., establish the accuracy or trustworthiness of basis of numerical simulation models turn
true, and elsewhere in the dictionary we the model prior to basing decisions upon it, out to be right.

1 456 Environmental Health Perspectives * Vol 106, Supplement 6 * December 1998


EVALUATION (NOT VALIDATION) OF MODELS

Are Validated Models Valid? function, but we necessarily measure them It is common to hear in regulatory and
Even if we were to set aside the conceptual at singular points in time, hoping that the scientific circles that public fears are irra-
issues raised by the example of celestial points are adequately representative. tional, and there is substantial evidence
mechanics and accept the restricted defini- Temporal errors arise from the assump- that public fears are irrational if viewed
tion of validation, i.e., that a valid model is tion that systems are stable in time when from a statistical standpoint (54,55). But
one without obvious flaws or defects, would they are not. For example, when we para- the language of validation does little to
it then be possible to say that a given model meterize a lead model, we represent longi- assuage such fears. Indeed, it exacerbates
is valid? The simple answer is no, because tudinal lead exposure through cross- them because the public has learned (not
even our best models have known flaws. sectional lead measurements and assume, without some justification) to be suspicious
Science motivated by social needs may suf- perhaps falsely, that these cross-sectional of reassurances (6,55). When citizens hear
fer this problem to a greater extent than sci- measurements are representative. Even if only positive claims, they start to doubt
ence based on questions arising within a they are representative, it might be from a them, and they may sometimes be right:
disciplinary framework. In the lab, scien- biologic standpoint that the highs and Some modelers have been guilty of exer-
tists may define a problem in such a way as lows are as important as the means. Tem- cises in legitimation of a predetermined
to rely primarily on areas where databases poral variations may be important in ways result. A perhaps surprising example can be
and conceptual understandings are very that are neither fully understood nor even found in the work of the Club of Rome.
rich, and from this core of understanding fully measured. The world model was developed by
venture outward toward the less well Meadows et al. (56) in the early 1970s for
known. Scientists often refer to this as the Validation versus Evaluation the Club of Rome, a group of European
well-posed problem. Throughout their his- Most scientists are aware of the limitations industrialists, statesmen, and scientists con-
tory, scientists, both as individuals and as of their models, yet this private understand- cerned about overuse of natural resources.
professional communities, have often set ing contrasts the public use of affirmative The model, described in the widely read
aside problems that could not be well posed. language to describe model results. Pub- book The Limits to Growth, predicted
Problems arising from social needs lished papers on validation are littered with widespread natural resource shortages,
typically are not well posed because the positive terms: nouns like acceptance and exponential price increases for raw materi-
world does not wait for scientific under- substantiation, adjectives like satisfactory, als, and possibly global economic collapse
standing. Where scientists have been asked adequate, and credible. The very word vali- before end of the century (56). The end of
to make models for use in policy domain, dation implies an affirmative result, that the the century is here and resource use contin-
whether the issue is lead poisoning, global process of validation will somehow validate ues to grow, but proven reserves of natural
climate change, or the safe disposal of the model (32). But where are the negative materials are greater today than in 1972
radioactive waste, our theoretical under- terms? If the purpose of validation is to and real prices are down for virtually all
standing and empirical databases are never determine whether a model is working well, commodities (57).
what we wish them to be. There are always shouldn't one also see nouns like rejection One reason why the predictions of the
known flaws and defects in large, complex, and refutation, adjectives like unsatisfactory world model have not come true is obvious
policy-driven models. and inadequate? The exercise of comparing in hindsight: the static way in which the
We can think of these flaws as falling a model with observations in the natural model treated natural resources. Natural
into four categories: theoretical, empirical, world is a test like any other scientific test, resources, such as copper, chromium, sil-
parametrical, and temporal. Theoretical and it must be possible for a model to fail ver, and gold, were treated in the model as
flaws are the things we do not fully under- that test. If the context of validation is such fixed and finite masses whose volumes
stand or do not have the mathematics to that only positive results emerge, then could only decrease as use increased. On
handle. In the case of lead toxicity, this something is wrong. one level, this view is indisputable; the
would include, for example, the problem The conspicuous absence of negative mass of chromium in the earth is a fixed
of differential susceptibility and the ques- language in the scientific literature of vali- (albeit unknown) number. But on another
tion of whether there is a safe threshold dation should give us pause, for it raises the level, this view is hopelessly inadequate
level of exposure. Empirical flaws are the following question relevant to both scien- because it ignores the fact that the resource
things we cannot fully or precisely mea- tific and regulatory perspectives: Is the of chromium is not the same as the mass
sure. This includes the pragmatic problem computer model a vehicle to prove what of it in the earth. A resource is something
of having limited resources with which to we think we already know or is it an honest that may be used by humans. This invol-
measure lead in the environment, and the attempt to find answers that are not prede- ves a number of factors, including the
difficulties of sampling bias and analytical termined? Put this way, it becomes clear price that people are willing to pay for it,
uncertainty, particularly at the very low that the goal of scientists working in a reg- the human and monetary capital available
exposure levels where regulatory limits ulatory context should be not validation to look for it, the technology available for
will be set. Parametrical flaws are the but evaluation, and where necessary, modi- extracting it, and the cost of labor used to
errors introduced when we reduce com- fication and even rejection. Evaluation get it. A reserve is an even more constricted
plex empirical phenomena to single or implies an assessment in which both posi- thing: reserves consist only of that portion
simply varying input parameters in a tive and negative results are possible, and of a resource that has been discovered,
model. Lead exposures vary continuously where the grounds on which a model is measured, and delineated.
with time, for example, but models declared good enough are clearly articu- The world modelers made an elision
require input of a single value or a finite lated. Validation implies an exercise in between the known reserves of a metal and
set of values for each individual. Likewise, legitimation, and this is precisely what the the total mass of that metal in the world as
blood lead levels are a continuously varying public fears. if they were the same thing. But they are

Environmental Health Perspectives * Vol 106, Supplement 6 * December 1998 1457


N. ORESKES

not. Whereas the total mass of a metal in should not be to demonstrate how bad over the age of the earth. Based on the
the earth must decrease or stay the same lead ingestion is or how good U.S. EPA concept of uniformitarianism, the assump-
over time, reserves can increase as a result standards are but to try to find out what is tion that observable geologic processes are
of increased exploration, improved tech- most likely to happen if given standards are representative of earth history in general,
nology, and/or decreased costs. Proven applied. The language of validation under- geologists in the late 19th century con-
reserves of most metals have increased since mines this goal. It presupposes an affirma- cluded that the earth was probably a few
1973, primarily because of more and more tive result and implies that the model is on billion years old. But they had no way to
effective geologic exploration during the track. To outsiders, it raises the specter that prove it, and efforts to calculate the earth's
past two decades, and prices have fallen as the answer was preestablished. age precisely had produced numbers as low
a result (57,58). There are other ways to talk about the as 100 million and as high as several hun-
Why did the world modelers make what problem. As Hodges and Dewar (29) dreds of billions. Kelvin, famous for his
is in retrospect such an obvious mistake? write, the quality of a model is not equiva- penchant for quantitative precision,
One reason is revealed by the post hoc lent to "agreement of the model with real- applied Fourier's theorem of conductive
comments of Aurelio Peccei, one of the ity." Quality can be evaluated in several cooling to the question. Assuming that the
founders of the Club of Rome. The goal of ways: on the basis of the underlying scien- earth has solidified from an incandescent
the world model, Peccei explained in tific principles, on the basis of quantity and globe, he obtained a maximum time of
1977, was to "put a message across," to quality of input parameters, and on the 98 million years for it to have cooled to its
build a vehicle to move the hearts and ability of a model to reproduce indepen- present surface temperature, and he
minds of men (59,21). The answer was dent empirical data. All of these things can promptly declared the entire science of
predetermined by the belief systems of the be discussed, but none of them should be geology invalid. Any conceptual scheme
modelers. They believed that natural discussed in either/or terms. Scientists that implied a billion-year old earth
resources were being taxed beyond the should resist the demand to describe any was fundamentally flawed, he declared.
earth's capacity and their goal was to alert model, no matter how good, as validated. Pursuing the same logic, he dismissed
people to this state of affairs. The result Rather than talking about strategies for val- Darwin's theory of natural selection on
was established before the model was ever idation, we should be talking about means the grounds of inadequate time for it to
built. In their sequel, Beyond the Limits, of evaluation. operate (61,62).
Meadows et al. (60) explicitly state that That is not to say that language alone For several decades, Kelvin's more
their goal is not to pose questions about will solve our problems, or that the prob- certain result held sway and evolutionists
economic systems, not to use their model lems of model evaluation are primarily lin- were in nearly full retreat until the discov-
in a question-driven framework, but to guistic. The uncertainties inherent in ery of radiogenic heat proved that it was
demonstrate the necessity of social change. large, complex models will not go away Kelvin rather than the geologists whose
"The ideas of limits, sustainability [and] simply because we change the way we talk conceptualization was faulty. We know
sufficiency," they write, "are guides to a about them. But that is precisely the now, of course, that the earth is 4.5 billion
new world." (60) point: calling a model validated doesn't years old, more than enough time for nat-
One need not engage in an argument make it valid. The language of validation ural selection to have operated as Darwin
for or against social change to see the prob- buries uncertainty; as scientists, we should envisaged it. Kelvin's calculations, although
lem with this kind of approach if applied be doing the opposite. We have an obliga- theoretically valid and highly precise, pro-
in a regulatory framework. The purpose of tion to invite open discussion of uncer- duced a result inaccurate by a factor of 50.
scientific work is not to demonstrate the tainties. And the more politically charged In his desire for certainty, Lord Kelvin
need for social change (no matter how the issue at hand, the more essential it is made one of the most colossal blunders in
needed such change may be) but to answer that these uncertainties be articulated the history of modern science. As his infa-
questions about the natural world. The clearly, freely, and in language that anyone mous mistake clearly shows, the uncon-
purpose of modeling is to pose and delin- can understand. trolled desire for certainty may lead to
eate the range of likely answers to "What One hundred years ago, Lord Kelvin fallacious quantification and a false sense
if?" questions. The purpose of lead models famously tried to eliminate uncertainty of security.

REFERENCES AND NOTES


1. Rosner D, Markowitz G. A " gift of God"? The public health lead levels in Belgium since 1978. Environ Res 51:25-34 (1990).
controversy over leaded gasotine in the 1920s. Am J Public 6. Nelkin D, ed. Controversy: Politics of Technical Decisions, 3rd
Health 75:344-352 (1985). ed. Newbury Park:Sage Publications, 1992.
2. Lip pmann M. 1989 Alice Hamilton Lecture. Lead and human 7. Bornschein RL, Clark CS, Grote J, Peace B, Roda S, Succop P.
heaIth: background and recent findings. Environ Res 51:1-24 Soil lead-blood lead relationship in a former lead mining town.
(1990). In: Lead in Soil: Issues and Guidelines (Davies BE, Wixson
3. Mushak P. Defining lead as the premiere environmental health BG, eds). Northwood, UK:Science Reviews, 1988;149-160.
issue for children in America: criteria and their quantitative 8. Miller GD, Massaro TF, Massaro EJ. Interactions between lead
application. Environ Res 59:281-309 (1992). and essential elements, a review. Neurotoxicology 11:99-120
4. Brush SG. Transmuted Past: The Age of the Earth and the (1990).
Evolution of the Elements from Lyell to Patterson. Cambridge, 9. O'Flaherty EJ. Physiologically based models for bone seeking ele-
UK:Cambridge University Press, 1996. ments. IV: Kinetics of lead disposition in humans. Toxicol Appl
5. Ducoffre G, Claeys F, Bruaux P. Lowering time trend of blood Pharmacol 118:16-29 (1993).

1458 Environmental Health Perspectives * Vol 106, Supplement 6 * December 1998


EVALUATION (NOT VALIDATION) OF MODELS

10. O'Flaherty EJ. Physiologically based models for bone-seeking Santa Monica, CA:RAND Corporation, 1992.
elements. V: Lead absorption and disposition in childhood. 30. Bankes SC. Exploratory Modeling and the Use of Simulation
Toxicol Appl Pharmacol 131:297-308 (1995). for Policy Analysis. Rpt no. N-3093-A. Santa Monica,
11. Menton RG, Burgoon DA, Marcus AH. Pathways of lead cont- CA:RAND Corporation, 1992.
amination for the Brigham and Women's Hospital 31. Hodges JS. Six (or so) things you can do with a bad model.
Longitudinal Lead Study. In: Lead in Paint, Soil and Dust: Operations Res 39:355-365 (1991).
Health Risks, Exposure Studies, Control Measures, 32. Oreskes N, Shrader-Frechette K, Belitz K. Verification, valida-
Measurement Methods, and Quality Assurance. ASTM STP tion, and confirmation of numerical models in the earth sci-
1226 (Beard ME, Allen Iske SD, eds). Philadelphia:American ences. Science 263:641-646 (1994).
Society for Testing and Materials, 1995;92-106. 33. Oreskes N. Testing models of natural systems: can it be done?
12. U.S. EPA. Air Quality Criteria for Lead. EPA-600/8-83-028. In: Structures and Norms in Science, (Chiara ML, Doets K,
Research Triangle Park, NC:U.S. Environmental Protection Mundici D, van Benthem J, eds). Amsterdam:Kluwer
Agency, 1986. Academic Publishers, 1997;207-217.
13. Davis JM, Svendsgaard DJ. Lead and child development. 34. Van Damme K, Casteleyn L, Heseltine E, Huici A, Sorsa, M,
Nature 329:297-300 (1987). van Larebeke N, Vineis P. Individual susceptibility and preven-
14. Leggett RW. An age-specific kinetic model of lead metabolism tion of occupational diseases: scientific and ethical issues.
in humans. Environ Health Perspect 101:598-616 (1993). J Occup Environ Med 37:91-99 (1995).
15. U.S. EPA. Validation Strategy for the Integrated Exposure 35. Konikow LF, Bredeheoft, JD. Ground-water models cannot be
Uptake Biokinetic Model for Lead in Children. EPA 540/R-94- validated. Adv Water Resour 15:75-83 (1992).
039. Washington:U.S. Environmental Protection Agency, 1994. 36. Nordstrom DK. On evaluating and applying aqueous geo-
16. Marcus AH, Cohen J. Modeling the blood lead - soil lead rela- chemical models. EOS Trans Am Geophys Union Suppl (April
tionship. In: Lead in Soil: Issues and Guidelines (Davies BE, 20):326 (1993).
Wixson BG, eds). Northwood, UK:Science Reviews, 1988; 37. Kuhn TS. The Copernican Revolution: Planetary Astronomy
161-174. in the Development of Western Thought. Cambridge,
17. Marcus AH, Elias RW. Estimating the contribution of lead- MA:Harvard University Press, 1957.
based paint to soil lead, dust lead, and childhood blood lead. 38. Hempel CG. Philosophy of Natural Science. Englewood Cliffs,
In: Lead in Paint, Soil and Dust: Health Risks, Exposure NJ:Prentice-Hall, 1966;23-24.
Studies, Control Measures, Measurement Methods, and 39. Popper KR. The Logic of Scientific Discovery. New
Quality Assurance. ASTM STP 1226 (Beard ME, Allen Iske York:Harper Torchbooks, 1937.
SD, eds). Philadelphia:American Society for Testing and 40. Popper, KR. Conjectures and Refutations: The Growth of
Materials, 1995;12-23. Scientific Knowledge. New York:Harper Torchbooks, 1963.
18. Hogan KA, Elias RW, Marcus AH, White PD. Unpublished 41. de Marsily G, Combes P, Goblet P. Comment on "Ground-
data. water models cannot be validated" by L.F. Konikow and J.D.
19. Jasanoff S. The misrule of law at OSHA. In: The Language of Bredehoeft. Adv Water Resour 15:367-369 (1992).
Risk: Conflicting Perspectives on Occupational Health (Nelkin 42. Neuman SP. Validation of safety assessment models as a
D, ed). Beverly Hills, CA:Sage Publications, 1985; 1 55-178. process of scientific and public confidence building. In: High
20. Jasanoff S. Science, politics, and the renegotiation of expertise Level Radioactive Waste Management: Proceedings of the
at EPA. Osiris 7:195-217 (1992). Third International Conference, 12-16 April 1992, Las Vegas,
21. Shakley S. Trust in models? The mediating and transformative Nevada. New York:American Society of Nuclear Engineers,
role of computer models in environmental discourse. In: 1992; 1404-1413.
International Handbook of Environmental Sociology (Redclift 43. Nir A, Doughty C, Tsang DF. Validation of design procedure
M, Woodgate G, eds). (Forthcoming). Cheltnham, UK: and performance modeling of a heat and fluid transport field
Edward Elgar, 1997; 237-260. experiment in the unsaturated zone. Adv Water Res
22. Marcus AH, Elias RW. Model Validation Workshop. Draft 15:153-166 (1992).
Outline. Research Triangle Park, NC:U.S. Environmental 44. Rykiel EJ. The meaning of models [Letter]. Science
Protection Agency, 1996. 264:330-331 (1994).
23. Balls M, Blaauboer B, Brusick D, Frazier J, Lamb D, 45. Woolf HB, ed. Webster's New Collegiate Dictionary.
Pemberton M, Reinhardt CA, Roberfroid M, Rosenkranz H, Springfield, MA:G&C Merriam Co, 1973.
Schmid B, et al. Report and Recommendations of the 46. Beer FA. Validities: a political science perspective. Soc
CAAT/ERGATT Workshop on the Validation of Toxicity of Epistemol 7:85-105 (1993).
Test Procedures. ATLA Altern Lab Anim 18:313-227 (1990). 47. Younker JL, Boak JM. Geological models [Letter]. Science
24. Balls M, Blaauboer B, Fentem JH, Bruner L, Combes RD, 264:1065 (1994).
Ekwall B, Fielder RJ, Guillouzo A, Lewis RW, Lovell DP, et al. 48. U.S. Department of Energy. Environmental Assessment
Practical aspects of the validation of toxicity test procedures: Overview: Yucca Mountain Site, Nevada Research and
report and recommendations of ECVAM Workshop 5. ATLA Development Area. Rpt no. DOE/RW-0079. Washington:
Altern Lab Anim 23:129-147 (1995). Office of Civilian Radioactive Waste Management, 1986.
25. Dee DP. A pragmatic approach to model validation. In: 49. International Atomic Energy Agency. Radioactive Waste
Quantitative Skill Assessment for Coastal Ocean Models Management Glossary. Doc no IAEA-TECDOC-264.
(Lynch DR, Davies AM, eds). Washington:American Vienna:International Atomic Energy Agency, 1982.
Geophysical Union, 1994. 50. Jasanoff S. The Fifth Branch: Science Advisors as Policymakers.
26. Curren RD, Southee JA, Speilmann H, Liebsch M, Fentem Cambridge, MA:Harvard University Press, 1990.
JH, Balls M. The role of prevalidation in the development, val- 51. Ohio vs U.S. Environmental Protection Agency. U.S. Law
idation and acceptance of alternative methods. ATLA Altern Week 54:2494 (1986).
Lab Anim 23:211-217 (1995). 52. Davis PA, Olague NE, Goodrich MT. Approaches for the
27. Bruner LH, Carr GJ, Chamberlain M, Curren RD. Validation Validation of Models Used for Performance Assessment of
of alternative methods for toxicity testing. Toxicol in Vitro High-Level Nuclear Waste Repositories. SAND90-
10:479-501 (1996). 0575/NUREG CR-5537. Albuquerque, NM:Sandia National
28. Marcus AH, Elias RW. Some useful statistical methods for Laboratories, 1991.
model validation. Environ Health Perspect 106(Suppl 6). 53. Ohio vs U.S. Environmental Protection Agency. U.S. Court of
1541-1550 (1998). Appeals, Sixth Circuit, 23 ERC 2091-2097, 1986.
29. Hodges JS, Dewar JA. Is It You or Your Model Talking? A 54. cohen BL. Before It's Too Late: A Scientist's Case for Nuclear
Framework For Model Validation. Rpt no. R-41 14-AIAF/OSD. Energy. New York:Plenum Press, 1983.

Environmental Health Perspectives * Vol 106, Supplement 6 * December 1998 1459


N. ORESKES

55. Shrader-Frechette KS. Risk and Rationality. Los 60. Meadows DH, Meadows DL, Randers J. Beyond the Limits:
Angeles:University of California Press, 1991. Confronting Global Collapse, Envisioning a Sustainable
56. Meadows DH, Meadows DL, Randers J. The Limits to Future. White River Junction, VT:Chelsea Green Publishing
Growth: A Report for the Club of Rome's Project on the Company, 1992.
Predicament of Mankind. New York:Universe Books, 1972. 61. Burchfield, JD. Lord Kelvin and the Age of the Earth.
57. Simon JL, Kahn H, eds. The Resourceful Earth: A Response to Chicago:University of Chicago Press, 1990.
Global 2000. Oxford:Blackwell 1984. 62. Smith C, Wise MN. Energy and Empire: A Biographical
58. Tierney J. Betting the planet. New York Times Magazine Study of Lord Kelvin. Cambridge, UK:Cambridge University
(December 2):52-81 (1992). Press, 1989.
59. Peccei A. The Human Quality. Oxford:Pergamon Press, 1977.

1460 Environmental Health Perspectives * Vol 106, Supplement 6 * December 1998

You might also like