DONALP RAYNOR, JR., ~ AL. Plaintiffs, vs. RICHARDSON-MERRELL INC., Defendants. ) ) ) ) ) Civil Action No. 83-3506 ) (Judge Thomas F. Hogan) ) ) ) ) --------------------------_.) MOTION IN LIMINE REGARDING THE SCOPE OF SCIENTIFIC EVIDENCE TO BE OFFERED AT TRIAL Defendants Merrell Dow Pharmaceuticals Inc., sued herein under its former name Richardson-Merrell Inc., and Standard Drug Company Inc. respectfully move this Court for an order governing the scope of the statistical evidence to be offered at trial. Unless statistical evidence used at trial conforms to generally accepted standards and is material, the danger of misleading the jury and prejudicing the party against whom such evidence is offered is so grave as to require that such evidence be excluded. The reasons for this motion are more fully set forth in defendants' accompanying memorandum of points and authorities. Dated: July 3, 1986 Respectfully submitted, ~ ~ MARK L. AUSTRIAN Bar No. 346593 PATRICK J. COYNE Bar No. 366841 COLLIER, SHANNON, RILL & SCOTT 1055 Thomas Jefferson Street, N. W. Washington, D.C. 20007 (202) 342-8400 Attorneys for Defendants Merrell Dow Pharmaceuticals Inc. and Standard Drug Co., Inc. UNITED STATES DISTRICT COURT FOR THE DISTRICT OF COLUMBIA DONALD RAYNOR, JR., ET AL. Plaintiffs, vs. RICHARDSON-MERRELL INC., Defendants. ) ) ) ) ) Civil Action No. 83-3506 ) (Judge Thomas F. Hogan) ) ) ) ) ---------------------------) ORDER GOVERNING SCOPE OF STATISTICAL EVIDENCE OFFERED AT TRIAL On the basis of the motion in limine submitted by defendants Merrell Dow Pharmaceuticals Inc. and Standard Drug Company Inc. regarding the scope of scientific evidence to be offered at trial, the memoranda of the parties in support of and in opposition to that motion, and argument of counsel, it is this day of ________ , 1986, hereby ORDERED that: 1. Statistical evidence be, and hereby is, admissible in evidence only if that evidence is statistically significant at a confidence level of 9596; and 2. Plaintiffs be, and hereby are, precluded from presenting alleged methodological flaws and other errors in epidemiological and animal studies in their case in chief unless plaintiffs can make a satisfactory showing that, absent the flaws, the studies would affirmatively demonstrate that Bendectin is a teratogen at the 9596 confidence level. THOMASF.HOGAN UNITED STATES DISTRICT JUDGE UNITED STATES DISTRICT COURT FOR THE DISTRICT OF COLUMBIA DONALD RAYNOR, JR., ET AL. Plaintiffs, vs. RICHARDSON-MERRELL INC., Defendants. ) ) ) ) ) Civil Action No. 83-3506 ) (Judge Thomas F. Hogan) ) ) ) ) ---------------------------) MEMORANDUM OF DEFENDANTS MERRELL DOW PHARMACEUTICALS INC. AND STANDARD DRUG COMPANY INC. IN SUPPORT OF MOTION IN LIMINE REGARDING THE SCOPE OF STATISTICAL EVIDENCE TO BE OFFERED AT TRIAL Defendants Merrell Dow Pharmaceuticals Inc. ("Merrell Dow"), sued herein under its former name Richardson-Merrell Inc., and Standard Drug Company Inc. ("Standard") respectfully submit this memorandum in support of its motion in limine regarding the scope of the scientific evidence to be offered at trial. One of the primary issues at trial will be whether the drug Bendectin, manufactured by Merrell Dow, cal1ses birth defects. Both parties will rely on a number of epidemiological studies in presenting their arguments on this issue of causation. These studies involve statistical analysis of populations of individuals with respect to Bendectin usage and birth defects. In order to be admissible, statistical evidence must conform to generally accepted standards used by professional epidemiologists and statisticians. As will be discussed below, in order for a study to show a valid statistical relationship between an exposure to a drug and a disease or malformation it must be "statistically significant" at
the 95% level. Merrell Dow requests an order providing that any statistical proof - 2 - concerning a relationship between Bendectin and birth defects be limited to those studies which show such a statistically significant association. If a study does not, that statistical evidence is not of a type reasonably relied upon by experts in the field to which It pertains and cannot form the basis for expert opinion at trial. The danger of prejudice or of misleading or confusing the jury requires that evidence of alleged associations that are not statistically significant must be excluded. Further, in attacking alleged flaws In the epidemiological studies that have been conducted on Bendectin, Merrell Dow requests an order providing that those flaws must necessarily alter the authors' conclusions and result in a statistically significant association at the 9596 level in order for any such alleged flaws to be material and to be presented in piaintiffs' case in chief. Identical motions have been granted by Judge Jackson in Richardson v. Richardson-Merrell Inc., No. 83-3505 (D.D.C. June 9, 1986) and Judge Johnson in Koller v. Richardson-Merrell Inc., No. 80-1258 (D.D.C. February 25, 1983), discussed below. 11 I. INTRODUCTION A. Statistical Significance Medicine is concerned with limiting or preventing disease. Once the cause of a particular disease is established, the ultimate objective is to prevent its occurrence. A factor may be said to cause a disease when its presence is shown to contribute to the development of the disease and its removal is shown to reduce the frequency ~ disease. The first step in determining whether there is a relationship between exposure and a disease or malformation is to determine whether a statistical association 11 A copy of Judge Jackson's Order is attached hereto as Exhibit A. A 90PY of Judge Johnson's decision is attached hereto as Exhibit B. Judge Jackson's Order states that the Order is granted preliminarily. -3- exists. Lilienfeld, A., Lilienfeld, D., Foundations of Epidemiology 289 (2d ed. 1980), Exhibit C. In other words, is there a statistical relationship between exposure to the drug and the occurrence of the disease or malformation. If there is no statistical relationship, then there is no reason to suspect the exposure. In the field of congenital malformations, the initial determination of whether a statistical association exists is through the use of epidemiological studies. Because of the dangers of incorrect conclusions based on statistical evidence, scientists use great care before concluding that a statistical association exists between exposure and a disease. Scientists generally use a confidence level of 9596 before making a determination that an association is "statistically significant". D. Freedman, R. Pisani, R. Purves, Statistics at 444 (1980), Exhibit D. The use of a 9596 confidence level is consistent not only with the published texts but also with the opinions of plaintiffs' experts in this case. For example, Dr. Done has testified as follows: Q. So in your FDA study you are telling me that you simply never addressed the issue of causation? A. No. That is not correct. You use the level of confidence that the difference exists at in drawing your conclusions about whether causation is likely. And there you would use whatever standard is required by whomever is asking that question. Scientifically, you would not say that you could show a probability of causation from whatever association you find, unless you find that at a 9596 level. That is the scientific level that scientists conventionally use. (Emphasis added.) Deposition of Alan K. Done at 225-26.!! As will be discussed later, the 9596 level is generally accepted in the scientific community. Dr. Done again acknowledged this point: Q. Before you start the exercise [referring to causation], that is the minimum confidence level, that is 9596? !! Deposition of Dr. Allan K. Done in Schumacher v. E.R. Squibb & Sons,., No. 84-2955 (C.D. Cal.) taken on July 2, 1985 ("Done Dep."). Copies of the cited pages are attached hereto as Exhibit E. -4- A. Yes. Q. That is the one that is generally accepted by the scientific community, as I understand. A. Yes. Q. You accept that too? A. Yes. Done Dep. at 226. Exhibit E. stated: Another of plaintiffs' experts in this case, Dr. Nancy Lord, Q. Would it be true that you cannot draw any cause and effect inferences or conclusions from epidemiologic studies where the association is not statistically significant? A. Yes, in general I'd agree. Deposition of Nancy T. Lord at 2 0 6 ~ / A statistically significant relationship does not itself establish causation,!/ but it is a critical first step. Exhibit C at 289. 1. The Anticipated Problem Merrell Dow anticipates that plaintiffs will in this action, as they have in other Bendectin cases, attempt to introduce expert testimony based on statistical analyses which are not generally accepted in the scientific community. Merrell Dow believes that plaintiffs' experts will attempt to ignore the limitations on statistical 3/ Deposition of Dr. Nancy Lord in Velleff v. Ortho Pharmaceutical Corp., No. 80 L 4988 (Cir. Ct. Cook County, Ill.), taken on July 31 - Aug. 2, 1985. Copies of the cited pages are attached hereto as Exhibit F. 4/ Once a statistically significant association has been found, a number of other factors must be evaluated before a conclusion as to causation can be reached. The Advisory Council to the Surgeon General of the Public Health Service in 1964 defined five criteria that should be fulfilled to establish a causal relationship: (1) the consistency of the association; (2) the strength of the association; (3) the specificity of the association; (4) the temporal relationship of the association and (5) the coherence of the association. - 5 - significance and testify that associations from epidemiological studies are evidence of causation even though the 9596 confidence level is not met. The rationale appears to be that plaintiffs' experts are entitled to ignore scientific principles when they testify in court and can use the legal test of "more probable than not" and say that a statistical association is significant at a 5196 level of confidence. Thus, plaintiffs desire to inject mathematical evidence into this case while ignoring the inherent limitations of that evidence. It is Merrell Dow's position that, under the Rules of Evidence, experts are not entitled to ignore generally accepted scientific principles when testifying. 2. The Richardson and Koller Decisions There is precedent directly on point in this District in other Bendectin cases. In Richardson v. Richardson-Merrell Inc., No. 83-3505 (D.D.C. June 9, 1986) Judge Jackson recently granted defendant's motion in limine regarding the scope of statistical and scientific evidence to be offered at trial preliminarily. In his Order Judge Jackson provided that "plaintiffs shall present no evidence absent a showing of statistical Similarly, in Koller v. Richardson-Merrell Inc., No. 80-1258 (D.D.C. February 25, 1983), Judge Norma Holloway Johnson granted Merrell Dow's motion to limit plaintiffs' proof: With respect to statistical significance, no statistical evidence will be admitted during the course of the trial unless it meets a confidence level of 9596. Id. at 1-2. (Footnote omitted.) / In its opinion, the Court cited a series of cases establishing that statistical evidence is inadmissible unless it meets the 9596 confidence level generally accepted by statisticians. These authorities are discussed at page 16, A co-py of Judge Jackson's Order is attached hereto as Exhibit A. 61 A copy of Judge Johnson's decision is attached hereto as Exhibit B. -6- B. Order of Proof Under the best of circumstances, the jury will have a difficult task understanding the epidemiological proof presented at trial. Since none of the statistically significant epidemiological studies support plaintiffs' case on limb defects, Merrell Dow anticipates that plaintiffs will attempt in their case in chief to attack the epidemiological studies without any regard to whether these alleged flaws would alter the results of the studies. As a result of these attacks, the jury will eventually be confused and lose sight of the fact that plaintiffs are obliged to produce statistically significant evidence to prove causation. In Koller, Judge Johnson recognized the potential for confusion and concluded: As for plaintiffs' plans to attack the epidemiological and animal studies of Bendectin, such evidence may not be presented in plaintiffs' case in chief unless plaintiffs first establish a foundation that the particular flaw would alter the conclusions of the study in a statistically significant manner. * * * Plaintiffs have no right to attempt to preempt defendants' anticipated defense in their case in chief by advancing alleged weaknesses in studies that are favorable to defendant. This approach would confuse the jury and would badly obscure the fundamental requirement that plaintiffs prove that Bendectin is a teratogen that caused the birth defects of Anne Koller. Koller, memo OPe at 2-3. Exhibit B. D. ARGUMENT A. It Is Essential That The Court Exercise Its Inherent Authority To Control The Mode And Order Of The Presentation Of Scientific Evidence At Trial In Order To Prevent The Jury From Being Confused Or Misled As developed under the common law, the judge has broad powers to control the mode and order of the presentation of evidence at trial. Moreover, the trial judge has an obligation to exercise that power in appropriate circumstances. Koller, memo OPe at 3. Exhibit B. Rule 611 of the Federal Rules of Evidence codifies this common law -7- power and responsibility. Rule 611 provides, in pertinent part, that: "[t]he court shall exercise reasonable control over the mode and order of presenting evidence so as to (1) make the interrogation and presentation effective for the ascertainment of the truth [and] (2) avoid needless consumption of time." Fed. R. Evid. 611(a); Baker v. United States, 401 F.2d 958, 987 (D.C. Cir. 1968), cert. denied, 400 U.S,, 965 (1970), Wright v. United States, 183 F.2d 821, 822 (D.C. Cir. 1950); United States v. Bender, 218 F.2d 869, 874 (7th Cir. 1955). This power is particularly critical in a case such as this which involves complex scientific evidence. The California Supreme Court has noted that, "mathematics, a veritable sorcerer in our computerized society, while assisting the trier of fact in the search for truth, must not cast a spell over him." People v. Collins, 68 Cal. 2d 319, 320, 438 P.2d 33 (1968). Courts are mindful of the danger that testimony expressing opinions or conclusions in terms of statistical probabilities may mislead and confuse the jury. United States ex reI. DiGiacomo v. Franzen, 680 F.2d 515 (7th Cir. 1982); United States v. Massey, 594 F.2d 676, 681 (8th Cir. 1979). The concern is that statistical evidence will have a potentially exaggerated impact on the trier of fact. "Testimony expressing opinions or conclusions in terms of statistical probabilities can make the uncertain seem all but proven." State v. Carlson, 267 N. W.2d 170 (Minn. 1978). Professor Tribe has noted that: the very mystery that surrounds mathematical arguments - the relative obscurity that makes them at once impenetrable by the layman and impressive to him - creates a continuing risk that he will give such arguments a credence they may not deserve and a weight they cannot logically claim. Tribe, Trial By Mathematics: Precision and Ritual In The Legal Process, 84 Harv. L. Rev. 1329, 1334 (1971). The mystique surrounding statistics merely compounds the problem. In , EEOC v. Federal Reserve Bank, 698 F.2d 633 (4th Cir. 1983), rev'd on other grounds, 467 -8- u.s. 867, 104 S. Ct. 2794 (1984), the United States Court of Appeals for the Fourth Circuit noted that: (i]naccuracies or variations in data or in the formulae used to test such data may easily lead to different, contradictory, or even misleading conclusions by experts. This fact prompted one court to comment that too often statistical conclusions "appear to depend in large part on the side producing them." Id. at 645, quoting Stastny v. Southern Bell Telephone & Telegraph Co., 458 F. Supp. 314, 324 (W.O. N.C.), aff'd in' part and reversed in part, 628 F.2d 267 (4th Cir. 1980). The manipulability of statistics has and should cause courts great concern: [S]tatistical evidence, like any other type of circumstantial evidence, "must not be accepted uncritically," and, because of the sophistication and complexity of many of the statistical models being used in discrimination cases by professional econometricians, courts must give "close scrutiny [to the] empirical proof "on which the models are erected, in order to guard against the use of statistical data which may have been "segmented and particularized and fashioned to obtain the desired result." EEOC v. Federal Reserve Bank, 698 F.2d at 645-46 {citations omitted}. Even expert statisticians can overlook critical factors in performing statistical analyses. Tribe, 84 Harv. L. Rev. at 1363. "[I]f [even the experts] were seduced by the mathematical machinery, one is entitled to doubt the efficacy of even the adversarial process as a corrective to the jury's natural tendency to be similarly distracted." Id. (Footnote omitted). Professor Tribe noted: [t]he problem of the overpowering number, that one hard piece of information, is that it may dwarf all efforts to put it into perspective with more impressionistic. sorts of evidence. The problem - that of the overbearing impressiveness of numbers - pervades all cases in which the trial use of mathematics is proposed. Id. at 1360-61. In spite of all these dangers, the Court cannot simply jettison statistics in this case. For better or worse, statistical analysis provides the most direct evidence of whether or not Bendectin causes birth defects. The evidence must be usedJn spite of its potential dangers. In re "Agent Orange" Products Liability Litigation, 603 F. Supp. 239 -9- (E.O. N.Y. 1985), 611 F. Supp. 1223 (E.O. N.Y. 1985) (properly developed epidemiological evidence is sound, reliable, and must be relied on in mass products liability litigation); Terrell v. United States, 517 F. Supp. 374, 379 (N.D. Tex. 1981) (rejecting finding of causation as speculative in the absence of epidemiological evidence or scientific understanding as to causation); Heyman v. United States, 506 F. Supp. 1145, 1149 (S.D. Fla. 1981) (physician cannot make accurate prediction of causation without at least some reference to epidemiological studies). Faced with the tension between the need for epidemiological evidence and its substantial potential for confusion, how then can epidemiological evidence be used without being abused? The court has at its disposal several measures that enable it to regulate the type and quality of the statistical proof admitted at trial. Fed. R. Evid. 611, 403, and 703. In addition, Section 1.80 of the Manual for Complex Litigation provides that when desirable to expedite the case, the court should provide an efficient method for submission and determination of preliminary legal questions. Manual For Complex Litigation 1.80, cited with approval in Tcherepnin v. Franz, 461 F.2d 544, 548 n.4 (7th Cir.), cert. denied, 409 U.S. 1038 (1972); Control Data Corp. v. International Business Machines Corp., 306 F. Supp. 839, 852 (D. Minn. 1969), appeal dismissed, 421 F.2d 323 (8th Cir.), affirmed sub nom., Data Processing Financial &. General Corp. v. International Business Machines Corp., 430 F.2d 1277 (8th Cir. 1970).1' The need for a thorough analysis under Fed. R. Evid. 703 is illustrated by the recent United States Supreme Court decision in Matsushita Electric Industrial Co. v. Zenith Radio Corp., No. 83-2004, slip Ope (March 26, 1986), which expressly approved the lower court's detailed analysis under Fed. '1/ In simplifying the proof for submission to the jury, the court has power to limit the evidence. Manual For Complex Litigation 4.30 at 184-85; United States V. Maryland &. Virginia Milk Producers Ass'n., 20 F.R.D. 441 (D.C. Cir. 1957) (reducing the period covered by the evidence to a reasonable length). Similarly, the courts have excluded statistical proof that did not relate directly to the ultimate question of discrimination to be resolved in employment discrimination cases under Title VII and the Equal Protection Clause. New York Transit V. Beazer, 440 U.S. 568 (1979); Coe v. Yellow Freight System, Inc., 646 F.2d 444, 452 (lOth Cir. 1981). - 10 - R. Evid. 703 excluding plaintiffs' proffered expert testimony. Slip Ope at 18 n.19. (A copy of this opinion is attached as Exhibit G hereto.) This motion raises just such a preliminary question. B. Admissibility of Epidemiological Evidence In order to be admissible, scientific evidence must first be of the type generally accepted by experts in the particular field to which that evidence pertains. Only then, can it be reasonably relied upon by expert witnesses. Fed. R. Evid. 703. There is essentially no dispute as to the general principles this Court should apply in determining whether statistical evidence is of the type generally accepted by statisticians and epidemiologists. There may, however, be some confusion in incorporating these principles into the standard of proof at trial. It is important, therefore, that the jury, the court, and the parties understand the threshold issue that statistical evidence must first be determined by the court, not the jury, to conform to generally acceptable scientific principles before it can be used by expert witnesses as a basis for their opinion testimony. 1. Only Statistical Evidence That Conforms To Generally Accepted Statistical and Epidemiological Principles Can Be Relied On As A Basis for Expert Testimony In Frye v. United States, 54 App. D.C. 46, 293 F. 1013 (D.C. eire 1923), the U.S. Court of Appeals for the District of Columbia Circuit set forth the standard by which questions of admissibility of expert testimony based on methods of scientific measurement are to be resolved. United States v. Addison, 498 F.2d 741 (D.C. Cir. 1974). Frye requires that scientific evidence be excluded unless the process, system, or theory on which the evidence is based is "sufficiently established to have gained general acceptance in the particular field to which it belongs." 293 F. at 1014. The Frye standard has been applied to a wide variety of scientific evidence including radar, public -11- opinion surveys, breathalizers, psycholinguistics, trace metal detection, bite mark comparisons, blood-spattering deductions, and psychological stress syndromes as well as a range of other studies, experiments, and tests. Courts have recognized that the foundational prerequisites set forth in Frye are needed to predict, and protect against, the p o s s i l ~ prejudicial dangers inherent in any expert scientific testimony used at trial. There are good reasons why not every ostensibly scientific technique should be recognized as a basis for expert testimony. Because of its apparent objectivity, an opinion that claims a scientific basis is apt to carry undue weight with the trier of fact. In addition, it is difficult to rebut such an opinion except by other experts or by cross-examination based on a thorough acquaintance with the underlying principles. In order to prevent deception or mistake and to allow the possibility of effective response, there must be a demonstrable, objective procedure for reaching the opinion and qualified persons who can either duplicate the result or criticize the means by which it was reached, drawing their own conclusions from the underlying facts. United States v. Brady, 595 F.2d 359, 362-63 (6th Cir.), cert. denied, 444 U.S. 862 (1979), quoting United States v. Brown, 557 F.2d 541 at 566 (6th Cir. 1977) and United States v. Baller, 519 F.2d 463, 466 (4th Cir.), cert. denied, 423 U.S. 1019, (1975). Thus, the Frye standard enhances the search for the truth and ensures fairness in the presentation and review of scientific evidence. 2. Epidemiology and Causation Numerous problems can be avoided if the Court requires the epidemiological evidence to conform to generally accepted epidemiological and statistical principles. Epidemiology is the only generally accepted scientific discipline that uses statistical techniques to identify the causes of human disease. It allows a scientific estimate of the degree of risk of a disease or condition that can be attributed to a given factor, such as exposure to an allegedly harmful drug. Epidemiology provides courts with a rational and - 12 - consistent method for evaluating evidence of causation between exposure to a given factor and the incidence of disease. Epidemiology has been described as a two-step process, beginning with statistical analysis and then attempting to draw biological conclusions from the results of that analysis. Basically, the epidemiologist uses a two-stage sequence of reasoning: 1. The determination of a statistical association between a characteristic and a disease; 2. The derivation of biological information from such a pattern of statistical associations. Lilienfeld, A., Lilienfeld, D., Foundations of Epidemiology at 13 (2d ed., 1980), Exhibit C. The epidemiologist attempts to discern the relationship between a disease and a factor suspected of causing it. This relationship is developed by comparing the disease experiences of people exposed to the factor with those not exposed to the factor. Id. at 3. This relationship between the factor and the disease is known as an "association." In the first step of the epidemiological process, development of an association, epidemiologists use statistical concepts to determine whether an association exists between a factor, such as exposure to a drug and a disease condition, and, if so, how large that association is. The epidemiologist compares the rate of incidence of the condition being studied among those exposed to the factor with the rate among those who are not exposed. These incidence rates are a measure of the probability that an individual will develop the condition. In effect, the epidemiologist is trying to determine If exposure to the .factor increases the probability that an individual will develop the condition. - 13- There are several types of epidemiological studies. Two principal types of studies have been performed on Bendectin: cohort and case controlJI In both, a statistic is developed that represents the increased risk of birth defects, if any, that may result from Bendectin use. Before this statistic can be used as a basis for concluding that there is an association, however, the investigator must assess whether any difference in the incidence rates of birth defects between the groups studied is real or whether it merely results from chance because the investigator has examined fewer than all of the individuals in the group being studied. Only if the investigator can be relatively certain that the difference is not due to chance can the rates be used to say anything about whether an association exists. The process of testing the rates to see whether there is any true difference between them is called "significance testing." If the investigator can reasonably exclude the possibility of chance, the difference is said to be "significant." If the statistic is significant, the next step is to estimate the magnitude of the association. The generally accepted means of measuring an association in a cohort study is to calculate what is called the "relative risk." This is simply the ratio of the incidence rate of the condition being studied in the group exposed to the factor divided / by the rate in the group that was not exposed. In a case-control study the measure is 81 The first type, called a cohort study, involves two groups of people, one exposed to the factor and one not exposed. Exhibit C at 226-27. The investigator follows these two groups and observes the incident rates of the condition in each group. The second type of study is called a "retrospective" or "case control" study. Rather than looking at the rate at which the condition occurs in groups that are exposed and not exposed, a case control study begins with individuals who already have the condition and individuals who do not. The investigator then examines past exposures to the factor to determine whether the group that has the condition was exposed to the factor more frequently than the group that does not. As with a cohort study, the investigator must first determine that any difference in rates of exposure between the two groups is in fact real before any inference can be drawn from the rates. Similar methods of significance testing are applied with case control studies and cohort studies. - 14- called an "odds ratio." The odds ratio is based on the rate of usage of the drug among patients who have the defect under study compared to the usage rate among patients who do not. The relative risk and odds ratio are both estimates of the magnitude of any association that can be drawn from the data. They are known as "point estimates." If there is no association between the factor and the disease, the point estimate is 1.0. That is, the rates of the two groups are equal. Thus, before a point estimate can be used as a basis for inferring anything about an association, it must be shown to be significant. That is, the difference that gave rise to the point estimate must be true and the investigator must be reasonably certain that it. is not due to chance simply as an artifact of the "luck of the draw." Only if the point estimate is significant is the epidemiologist or statistician concerned with whether it is large enough in magnitude to support the conclusion that there is an association between the factor and the condition or disease. lO / Only after both of these criteria are satisfied, can the epidemiologist move to the second stage of the epidemiological reasoning process - developing biological inferences from a pattern of statistical associations. The process of drawing biological inferences from an association is beyond the scope of this motion. The existence of an association, however, is a threshold requirement that must be satisfied before any 9/ Because of inherent limitations on the design of case control studies, the investigator cannot directly determine incidence rates among the exposed and non-exposed groups. Accordingly, a relative risk cannot be calculated directly. Cornfield, "A Method Of Estimating Comparative Rates From Clinical Data: Applications to Cancer of the Lung, Breast and Cervix," 11 J. National Cancer Institute 1269 (1951). The same type of calculation, the rate of exposure among cases divided by the rate of exposure among controls, however, can be calculated. This measure is called the "odds ratio" for a case control study. The odds ratio closely approximates the relative risk. Hence, the two can be used almost interchangeably as measures of an assocation. 10/ The greater the magnitude of the observed relative risk, the stronger the association between the factor and the disease. When a statistically significant relative risk of ten or more is found, one can be certain that the factor causes the disease or condition. - 15 - biological inference can be drawn from it. The requirement that the association be real, that is to say "significant" in statistical terms, is in turn a threshold requirement that must be satisfied before the investigator can conclude that an association in fact exists. It is critical that statistical evidence conform to these generally accepted principles of statistical signficance before it can be relied on by expert witnesses who will provide their opinions as to causation on the basis of that evidence. If the association is not significant, it cannot reasonably be relied upon by an expert. The degree of signficance required by epidemiologists and statisticians in order to be relatively certain that an association is not due simply to chance is relatively high. Merrell Dow anticipates that plaintiffs will argue that it is "too high" and that the Court should allow plaintiffs' experts to give their opinion on causation based on statistical evidence that epidemiologists or statisticians would not accept as reflecting a true association. In so doing, Merrell Dow anticipates that plaintiffs will argue that an association need only more likely than not be real rather than due to chance. c. The Epidemiological Evidence Must Be Statistically Significant At a 9596 Confidence Level Before It Is Admissible Into Evidence Or Can Be Relied On By Expert Witnesses At Trial Epidemiologists and statisticians universally use a confidence level of 95% in testing whether the differences between the rates of disease or exposure between two groups is real. This means that epidemiologists and statisticians demand that there is no more than a 596 chance that the difference is due to the "luck of the draw." The epidemiological studies done on Bendectin and its components have been based on generally accepted scientific principles. Because these studies do not, taken as a whole, show a statistically significant association between Bendectin and an increased incidence of birth defects, plaintiffs have attempted to distort the accepted standards of statistical significance in order to arrive at conclusions contrary to those of the authors of the stUdies. This distortion violates basic principles of statistics and epidemiology. - 16- The lower confidence limits suggested by plaintiffs are not generally accepted by epidemiologists and statisticians and should not be accepted into evidence in this case. 1. Epidemiologists And Statisticians Generally Require That An Association Be Significant At The 9596 Confidence Level Before It Can Be Accepted As A True Association Epidemiologists must be able to determine the probability that an observed statistical association is due to chance or errors in the sampling of data instead of reflecting a true association. The 9596 confidence level has become established in the scientific community as the standard of associations that are "statistically significant." This 9596 confidence level is also referred to as the 596 "significance level." A 99% confidence level is sometimes used to define results considered "highly statistically significant." D. Freedman, R. Pisani, R. Purves, Statistics at 444 (1980), Exhibit D; T. Wonnacott, R. Wonnacott, Introductory Statistics, at 252 n.16 (3d Ed. 1977) Exhibit H. Epidemiologists generally, and all of the experts whose studies will be offered by Merrell Dow in this case, 11/ use a 9596 confidence level in determining whether the differences they observe are likely to be due to chance alone. Testimony of Ollie Heinonen in Mekdeci v. Merrell-National Laboratories Inc., No. 77-255-0rl-Civ-Y (M.D. Fla. 1981), affld, 711 F.2d 1510 (11th Cir. 1983) Tr. at 4505-4510, Exhibit I; Testimony of Brian MacMahon in Mekdeci, Tr. at 5420-21, 5434-35, Exhibit J. The published epidemiological studies on Bendectin or its components generally use a 11/ Merrell Dow has filed simultaneously with this motion a motion to admit the relevant epidemiological stUdies on Bendectin into evidence. References tQ Appendices I and II contained herein are to those Appendices contained in that motion. -17 - confidence level of 9596. 12 / S e e ~ L. Milkovich and B.J. van den Berg, "An evaluation of the teratogenicity of certain antinauseant drugs," Amer. J. Obstet. Gynec. 125(2): 244-248 (May 15, 1976); Appendix I, Exhibit 2; G. Greenberg, et al., "Maternal Drug Histories and Congenital Abnormalities," Brit. Med. J. 2:853-56 (October 1977), Appendix I, Exhibit 6; G.T. Gibson, et al., "Congenital Anomalies in Relation to the Use of Doxylamine/Dicyclomine and other Antenatal Factors," Med. J. Aust. 1:410-414 (April 18, 1981), Appendix I, Exhibit 11; J.F. Cordero, G.P. Oakley, et al., "Is Bendectin A Teratogen? ," J. Am. Med. Assoc. 245(22):2307-2310 (June 12, 1981), Appendix I, Exhibit 13; Heinonen, Sloan and Shapiro, "Birth Defects and Drugs in Pregnancy," Publishing Sciences Group, Inc., Littleton, Mass. (l977), Appendix I, Exhibit 4. The 596 level for statistical significance, as well as the 196 level for highly statistically significant findings, is an arbitrary cutoff point. It is, however, universally accepted in the scientific community. This line is reflected in the practice of academic journals in accepting articles for publication - they universally use the 596 level of statistical significance. Freedman at 493, Exhibit D. A significance level greater than 596, corresponding to a confidence level of less than 9596, is not acceptable in the profession as establishing statistical significance. This principle is accepted by plaintiffs' experts in this case. (See pp. 2 - 3, supra.) 12/ The Michaelis study used a 9096 confidence interval. J. Michaelis, et al., prospective Study of Suspected Associations Between Certain Drugs AdministeredDuring Early Pregnancy and Congenital Malformations," Teratology 27:57-64 (l983); "Teratogene Effekte Von Lenotan?," "(Does Lenotan Have Teratogenic Effects?)" Deutsches Arzteblatt 23:1527-1529 (June 1980) (English translation), Appendix I, Exhibit 8. It did so in a so called "two-tailed" test. The issue whether a so called "one-tailed" or "two- tailed" test is appropriate is a minor and extremely confusing issue. See Freedman, Statistics at 494-96. Exhibit D. Merrell Dow does not assert that Bendectin has any protective effect with respect to birth defects. Further, the lower confidence limit is the same regardless which test is used. Hence, the differences between "one-tailed" and "two-tailed" tests of significance are not relevant. See Koller at 1 n.*, Exhibit B. What is important is that a 9596 confidence level be used in accordance y{ith generally accepted statistical principles. - 18- The 9596 requirement has been recognized time and again by the courts. It is now firmly ensconced in judicial precedent as well as scientific practice. Courts have held that scientific evidence must conform to the standards recognized by professional statisticians, including the 9596 confidence level as a measure of statistical significance. In Moultrie v. Martin, 690 F.2d 1078, 1082-85 (4th Cir. 1982), plaintiffs sought to demonstrate discriminatory selection of grand jurors based in part on a showing of historical underrepresentation. Plaintiff failed, however, to calculate the statistical significance of the figures presented. The court stated: When a litigant seeks to prove his point exclusively through the use of statistics, he is borrowing the principles of another discipline, mathematics, and applying these principles to the law. In borrowing from another discipline, a litigant cannot be selective in which principles are applied. He must employ a standard mathematical analysis. Any other requirement defies logic to the point of being unjust. Statisticians do not simply look at two statistics, such as the actual and expected percentage of blacks on a grand jury, and make a subjective conclusion that the statistics are significantly different. Rather, statisticians compare figures through an objective process known as hypothesis testing [w]ithout the use of hypothesis testing, a court may give weight to statistical differences which are actually mathematically insignificant For this reason it is particularly important that courts follow such formulae before drawing conclusions from statistical evidence, and we so require it. Id. at 1082-83. (Citations omitted.) In Moultrie, 690 F.2d at 1083 n.7, the court recognized that statisticians usually state their conclusions in terms of "whether the difference between actual and expected values is statistically significant at a given confidence level. Statisticians usually use 9596 or 9996 confidence levels." Generally, courts have required that statistical significance testing conform to a 9596 confidence level. Little v. Master-Bilt Products, Inc., 506 F. Supp. 319, 327 n.7 (N.D. Miss. 1980) (noting that a 9596 confidence level was required but that an even higher level may be required in some cases); Taylor v. TeletyPe Corp., 475 F. Supp. 958, 962 (E.D. Ark. 1979), cert. denied, 454 U.S. 969 (1981); -19 - EEOC v. American National Bank, 652 F.2d 1176, 1192 (4th Cir. 1981). The Supreme Court has repeatedly held that significance testing must be used in analyzing statistical evidence. Castaneda v. Partida, 430 U.S. 482, 496 n.17 (1977); Hazelwood School District v. United States, 433 U.S. 299, 307-11 (1977); Mayor of Philadelphia v. Educational Equality League, 415 U.S. 605, 619-21 (1974). In so stating, the United States Supreme Court adopted a 9596 confidence level in a series of cases involving employment discrimination. 13 / The Court noted in Castaneda that some fluctuation from the expected number is anticipated in any statistical measure: The important point, however, is that the statistical model shows that the results of a random drawing are likely to fall in the vicinity of the expected value. The measure of the predicted fluctuations from the expected value is the standard deviation [I]f the difference between the expected value and the observed number is greater than two or three standard deviations, then the hypothesis that the jury drawing was random would not be suspect to a social scientist. Castaneda, 430 U.S. at 496 n.17 (citation omitted). The two or three standard deviations referred to in Castaneda correspond to confidence levels of 95 to 9996. Moultrie, 690 F.2d at 1084 n.10. The Court reinforced its holding in Castaneda in Hazelwood School District v. United States, 433 U.S. 299, 308 n.14, 311 n.17 (1977), adopting the Castaneda standard of statistical significance testing based on fluctuations from the expected value of two or three standard deviations. 13/ The Court's rulings on statistical significance in discrimination cases are equally applicable in the context of the Bendectin cases. The statistical analysis employed with respect to employment discrimination is substantially identical to that employed in epidemiology. The only meaningful difference is that, whereas a doubling of the relative risk may be needed in order to show that the condition or disease is more likely than not to have been caused by the alleged teratogen in an epidemiological study, a lower point estimate may well suffice in a discrimination case where it would be expected that there is no background rate of discrimination. See Castaneda v. Partida, 430 U.S. 482, (1977); Hazelwood School District v. United States, 433 U.S. 299 (1977); Mayor of Philadelphia v. Educational Equality League, 415 U.S. 605 (1974). - 20 - This view is supported by the decision of Judge Jackson in another case involving virtually identical allegations, Richardson v. Richardson-Merrell Inc., No. 83- 3505 (D.D.C. June 9, 1986) where plaintiffs were precluded from presenting any statistical or scientific evidence absent a showing of statistical significance. Another court in this District has also held, in a case involving Bendectin, that no statistical evidence will be admitted during the course of the trial "unless it meets a confidence level of 9596." Koller v. Richardson-Merrell Inc., No. 80-1258, memo OPe at 1 (D.D.C. February 25, 1983). Exhibit B. In Koller, Judge Norma Holloway Johnson sua sponte raised two evidentiary issues relating to the scope of the statistical evidence to be admitted at trial. 141 In requiring that statistical evidence conform to the 9596 confidence level, Judge Johnson noted that every study examining whether Bendectin is a teratogen has employed a confidence level of at least 95%. Id. at 2. In addition, plaintiffs concede that social scientists routinely utilize a 9596 confidence level. Finally, all legal authorities agree that statistical evidence is inadmissible "unless it meets the 9596 confidence level r ~ q u i r e by statisticians." Koller, memo Ope at 2. Particularly in view of the Courts' recognition that the 95% confidence level is generally accepted in a scientific community, there is simply no justification for allowing lax and unprofessional statistical opinions and analyses at trial. Statistical evidence that does not conform to the 9596 confidence level would not be tolerated outside the courtroom, nor would it be allowed in scientific journals. Plaintiffs' desire to alter the level of statistical significance ignores the well-established convention among epidemiologists. Professional statisticians and epidemiologists recognize that significance at a level of less than 596 is an indication that additional confirmatory work 141 Plaintiffs in Koller are now represented by the same plaintiffs' counsel representing the plaintiffs in this case. . - 21 - should be done before moving forward with step two of the epidemiological reasoning process and attempting to derive biological inferences from the data. 15/ Defendant respectfully submits that the appropriate level of statistical signficance at which to consider the published epidemiologic studies in this case is the level used by the authors of the various studies and employed by the editors of the scientific journals in which they were published. Confidence levels of 9596 or 9996 (corresponding respectively to significance levels of 596 and 196) are generally recognized , . and used by epidemiologists and statisticians. These significance levels are supported by the case law and the scientific literature. The use of any other level of statistical significance is unwarranted and would lead the Court to accept allegedly scientific "results," cloaked in the mystique of mathematics, that scientists themselves would not accept. 2. Statistics and the Standard of Proof Any argument that an association drawn from an epidemiological study can be relied on if it is simply more likely than not to be a true association confuses basic principles of statistical significance with this Court's standard of proof. The two are distinct and different. Confusion in this regard will only mislead the jury. Contrary to plaintiffs' contention, statistical significance does not equate to the burden of proof. Significance is a threshold issue, akin to a ruling by the Court on the admissibility of evidence. An association that is valid only at a level of statistical 15/ In addition, plaintiffs misconceive the nature and importance of statistical testing itself. The existence of one study showing a statistically significant association would not itself demonstrate causation. Even at the 596 signficance level, statistically significant associations would occur by chance 596 of the time even if there is no true association. Freedman at 494, Exhibit D. Thus, when over 20 studies have been conducted, such as they have on Bendectin, at least one statistically significant association would be expected due to chance alone, even if no association exists in reality. The relevant epidemiological studies on Bendectin do not support Jiny biological inference of causation. - 22- significance of 5196 is not probative of causation, regardless of the value of the point estimate. It cannot reasonably be relied upon by epidemiologists or statisticians. The statistical measure that corresponds to the standard of proof is the magnitude of the association, not its significance. Where there is a background rate of a disease or condition, such as there is with birth defects, it is necessary to show that the factor is associated with a doubling of the background incidence rate in order to infer causation. In other words, in order for it to be more likely than not that Bendectin and not some other factor was the cause of any individual's birth defect, the magnitude of the point estimate (1) would have to be statistically significant at the 9596 confidence level and (2) would have to be greater than 2.0. 16 / This point was recently recognized by the court in Marder v. G. D. Searle & Co., No. Y-82-3506, slip. OPe (D. Md., March 19, 1986) when the court stated: In epidemiological terms, a two-fold increased risk is an important showing for plaintiffs to make because it is the equivalent of the required legal burden of proof -- a showing of causation has the preponderance of the evidence or, in other words, a probability of greater than 5096. Mem. Ope at 14 (Exhibit K). 16/ Since there is a background rate of birth defects, a certain number of birth defects will be caused by factors other than the alleged teratogen, regardless whether the substance is or is not teratogenic. Even if it can be shown by a statistically significant association greater than 1.0 that the factor is associated with the condition, a certain number of birth defects will be due to f c ~ o r s other than the alleged teratogen. If a statistically significant association of 1.5 is found, for example, only one out of every three cases of birth defects could be attributable to the factor. Two of the three cases would be caused by factors contributing to the background rate. It would be twice as likely that any individual's defect was part of the background rate, rather than due to the factor. At a statistically significant association of 1.5, therefore, it would be more likely than not that the factor did not cause any particular individual's condition. Were a statistically significant association of 2.0 found, it would be equally likely that any individual plaintiff's condition would be due to background factors as it would to the suspected teratogen. Only when a statistically significant association greater than 2.0 is found, does it become more likely than not that any particular plaintiff's condition was due to the suspected teratogen. - 23- Hence, the Court must wrestle with two statistical principles. The first is a threshold issue - whether an association is real, or statistically significant. This issue relates to the admissibility of evidence of the association. Only if the association is found to be significant does the Court come to the second issue -- the magnitude of the association. If the association is not significant, it cannot be used to infer causation. It is the magnitude, or the importance of the association, that relates to the burden of proof. Marder, Mem. OPe at 24, Exhibit K. The magnitude goes to the weight of the evidence. Only if a statistically significant association greater than 2.0 can be found can it be said that a particular factor under study is more likely than not to have caused any particular plaintiff's condition. In Cook V. United States, 545 F. Supp. 306 (N.D. Cal. 1982), the court applied these principles to plaintiff's allegation that her disease was caused by her immunization under the swine flu program. Both parties contested the magnitude of the relative risk developed from certain epidemiological studies. The court noted that: Whenever the relative risk to vaccinated persons is greater than two times the risk to unvaccinated persons there is a greater than 5096 chance that a given GBS case among vaccinees of that latency period is attributable to vaccination, thus sustaining plaintiff's burden of proof on causation. 545 F. Supp. at 308. The plaintiff in Cook, however, was unable to show that the relative risk remained above 2.0 at the time she was vaccinated. Accordingly, plaintiff's data was insufficient to satisfy plaintiff's burden of proving causation. Id. at 316. Hence, it is the magnitude of the association and not its significance that corresponds to the "more likely than not" requirement for evidence to be probative. - 24- D. Methodological Defects Of The Epidemiological Studies Are Not Probative Unless They Effect Statistically Significant Changes In The Conclusions Of The Studies Merrell Dow also anticipates that plaintiffs will attack the studies on the basis of a number of alleged methodological defects or flaws. These attacks, however, are neither probative nor material absent evidence that the conclusions reached by the authors of those studies would have been altered, absent those defects. In view of the difficulty the jury will have in understanding and digesting the scientific evidence itself, let alone any alleged defects in that evidence, it is necessary that the Court establish reasonable ground rules governing the admissibility of this evidence. 1. The Court Has Great Latitude In Excluding Evidence That Will Mislead The Jury, Confuse The Issues, Or Waste Time Evidence should be excluded where it might create undue prejudice or confuse or mislead the jury. Douglas v. United States, 386 A.2d 289 (D.C. 1978); United States v. Margiotta, 662 F.2d 131, 143 (2d Cir. 1981), cert. denied, 461 U.S. 913 (1983) (state law violations excluded in prosecution for violation of federal law because of prejudicial effect); Rigby v. Beach Aircraft Co., 548 F.2d 288 (10th Cir. 1977) (evidence of defects other than those in issue would have confused the jury); E. Cleary, McCormick on Evidence, 185 (2d ed. 1972); 6 Wigmore On Evidence 1864, 1865, 1904 (1976). These principles are expressly embodied in Federal Rule of Evidence 403 which permits the court to exclude evidence "if its probative value is substantially outweighed by the danger of unfair prejudice, confusion of issues, or misleading the jury, or by considerations of undue delay, waste of time, or needless presentation of cumulative evidence." In conjunction with that power, the court has broad discretionary authority over the presentation of evidence. 6 Wigmore On Evidence 1867 (1976); Griffin v. United States, 164 F.2d 903, 904 (D.C. Cir. 1947), cert. denied, 333 U.S. 857 (1948). As - 25 - noted at page 6, supra, Rule 611 of the Federal Rules of Evidence grants the court broad discretion to exercise reasonable control over the mode and order of presenting evidence. The court noted in Griffin, construing Fed. R. Evid. 611, that it is the duty of the trial judge to see that the facts of the case are properly developed and that the jury is not confused: It cannot be too often repeated, or too strongly emphasized, that the function of the federal trial judge is not that of an umpire or a moderator at a town meeting He sits to see that justice is done in the cases heard before him; and it is his duty to see that the case on trial is presented in such a way as to be understood by the jury, as well as himself [The trial judge] has no more important function than to see that the facts are properly developed and that their bearing upon the question at issue are clearly understood by the jury. Griffin, 164 F.2d at 904-05, quoting Simon v. United States, 123 F.2d 80, 83 (4th Cir. 1941), cert. denied, 314 U.S. 694 (1941). The court's discretionary power to control the presentation of evidence has been exercised to require a litigant to present a portion of the evidence he might not otherwise present in order to make his .other evidence comprehensible. Sweitlowich v. County of Bucks, 610 F.2d 1157 (3d Cir. 1979); Baker v. United States, 401 F.2d 958 (D.C. Cir. 1968), cert. denied, 400 U.S. 965 (1970). Similarly, it is necessary in this case that the Court exercise its power to prevent plaintiffs' preemptive attack based on irrelevant and immaterial allegations of methodological flaws in the epidemiological evidence. 2. Plaintiffs' Alleged Methodological Flaws Are Irrelevant On The Issue Of Causation Epidemiology is an imperfect science. It is virtually impossible to eliminate all confounding factors to match the groups being studied perfectly, to manage a large sample, to police with perfection the collection of data, and to anticipate every problem in the methodology. Some of the "criticisms" advanced by plaintiffs' witness, Dr. Done, would be true of almost all epidemiological studies. Other of the alleged flaws are - 26- simply disagreements of judgment. All are obvious points of which the scientists evaluating the studies would be well aware. The studies relied on by Merrell Dow were published in scientific journals. Most of these journals are peer review journals which will not publish work until after it has been closely scrutinized by a critical group of scientist-editors and found to have substantial merit. Dr. Done has never authored any such publication involving an epidemiological study of Bendectin. Nonetheless, his criticisms include: 1. Use of mothers of children with birth defects rather than mothers with normal children as a control group in the Cordero and Oakley study, Appendix I, Exhibit 13; 2. Using a large population which includes a relatively small number of Bendectin mothers: Heinonen, Appendix I, Exhibit 4; Smithells, Appendix I, Exhibit 7; Michaelis, Appendix I, Exhibit 8; 3. Reliance on prescriptions as evidence of ingestion of the drug: Smithells, Appendix I, Exhibit 7; Jick, Appendix I, Exhibit 15; 4. Failure to identify possible over-the-counter use of drugs containing Doxylamine or possible Bendectin use among controls (all studies except those based on questionnaires); and 5. Dilutron of effects on specific periods of organogenesis by using too long a test period such as the first trimester (all except Smithells). Each of these observations reflects some of the compromises necessary in epidemiological studies. In No. 1 above, Cordero and Oakley wished to minimize recall bias which favors the memory of mothers whose children have birth defects. In No.2 the control populations being studied were immense. The larger control populations provide more powerful statistical analysis and greater certainty in determining the background rates for specific categories of birth defects. In No.3, records of filled prescriptions, if computerized, are at least complete and relatively foolproof. The problem of actual ingestion can never be solved unless the investigator is observing each mother while she takes the pill. In most of the studies, prescription use was spot-checked by interview or questionnaire. The problem identified in No.4, of unreported drugs, is present in any study, but is reduced by conference and questionnaire. Finally, the problem of dilution, - 27- No.5, is also a compromise. Because of the uncertainty as to conception and the interest in studying more than one kind of birth defect, some dilution is necessary in order that relevant data not be omitted. These points are not flaws but, rather, reflect compromises made by the authors on the basis of their professional judgment. To the extent that these compromises are shortcomings, they have been openly discussed. They are not quantifiable in precise terms. In the absence of repeated, confirmed, statistically significant associations as to categories including limb reduction defects, these observations are not more than cautionary. While anyone may be used to argue that an additional study is needed, Bendectin has been thoroughly studied by a significant number of investigators in more than 30 epidemiological studies. None of these alleged flaws would change the outcome of any of the studies at which they are leveled. In the absence of proof of statistically significant changes in the results of the studies, these arguments should be left to plaintiffs' impeachment of Merrell Dow's proof. They do not constitute affirmative evidence of causation. Plaintiff's alleged defects are not relevant to negligence or any other basis of liability. Openly discussed methodological compromises are expected in almost all epidemiological studies. There is no evidence that the data of any of the studies was misread. Interpretation of that data was a matter of scientific judgment. Over twenty years of continued study shows overwhelming, confirmatory proof of the lack of an association between Bendectin and an increased incidence of limb reduction defects. Reliance on these studies, notwithstanding these open "compromises" is not negligent and does not raise an issue of fact. - 28- 3. The Complexity Of The Subject Matter Compels Exclusion Of Plaintiffs' Conjectured Defects Even Were They Marginally Relevant Methodological criticisms, unless they result in a statistically significant alteration of the study's conclusion, will confuse the jury. In the Court's Memorandum Opinion of February 25, 1983 in Koller, Judge Johnson considered the foundation which must be established before plaintiffs may introduce in their case in chief evidence of alleged methodological flaws and other weaknesses of epidemiological studies. The Court held that attacks on the epidemiological and animal studies of Bendectin: may not be presented in plaintiffs' case in chief unless plaintiffs first establish a foundation that the particular flaw would alter the conclusions of the study in a statistically significant manner. In other words, it will be insufficient to suggest that methodological errors significantly weaken the conclusions of the epidemiological studies. To be admissible in plaintiffs' case in chief, it must be demonstrated initially that the study, absent the methodological error, would indicate that Bendectin is a teratogen at the appropriate level of statistical significance. Koller, memo OPe at 2-3. Exhibit B. Judge Johnson preserved for plaintiffs their right to present in rebuttal the weaknesses of the stUdies. The Court in Koller properly required that such rebuttal evidence, nonetheless, be material. The Court limited the nature of the rebuttal: [P]laintiffs will not be permitted to cite trivial flaws absent a preliminary showing that the alleged flaws would materially weaken the conclusions of the studies. That some of the raw data would have changed if the alleged flaws had not occurred also is insufficient to establish materiality. Id. at 3 n. Exhibit B. The Court stated further: Plaintiffs have no right to attempt to preempt defendants' anticipated defense in their case in chief by advancing alleged weaknesses and studies that are favorable to defendant. This approach would confuse the jury and would badly obscure the fundamental requirement that plaintiffs prove that Bendectin is a teratogen that caused the birth defects of Ann Koller. Regardless of how serious plaintiffs believe the flaws in - 29- the epidemiological [and] animal studies are, these flaws are irrelevant to plaintiffs' case in chief unless plaintiffs can employ these flaws to show that the studies affirmatively demonstrate that Bendectin causes birth defects. Id. at 3 - 4. Exhibit B. Without this limitation, plaintiffs would be able through the use of irrelevant and immaterial evidence to effectively shift the burden of proving that Bendectin is safe to Merrell Dow, while at the same time attempting to destroy the substantial evidence amassed by the scientific community exonerating Bendectin. Absent some effect on the conclusions reached by the authors of these studies, the alleged defects are simply immaterial. Even a straightforward presentation of the epidemiological evidence in accordance with generally accepted scientific principles will be extremely difficult for a lay jury to understand and digest. If confounded by plaintiffs' attempts to needle the studies by injecting immaterial alleged methodological flaws, the jury will become hopelessly confused. Unless these flaws would have some effect on the outcome of the studies, they do not support plaintiffs' position on causation. The alleged flaws should be relegated to their only proper position in this case - plaintiffs' rebuttal evidence. If plaintiffs are able to show some effect on the outcome that would be statistically significant, only then will these alleged defects be relevant and material. m. CONCLUSION For the foregoing reasons, defendants respectfully request that the Court enter an order limiting the statistical evidence to that which is statistically significant at a confidence level of 95% (statistically significant at the 5% level), and that plaintiffs be precluded in their case in chief from raising claims of methodological error in the - 30- epidemiological studies unless (1) correction of those alleged errors would vary the outcome of the studies and (2) plaintiffs' methodological corrections caused t ~ studies' results to reach a level of statistical significance. Dated: July 3, 1986 Respectfully submitted, MARK L. AUSTRIAN Bar No. 346593 PATRICK J. COYNE Bar No. 366841 COLLIER, SHANNON, RILL &: SCOTT 1055 Thomas Jefferson Street, N. W. Washington, D.C. 20007 (202) 342-8400 Attorneys for Defendants Merrell Dow Pharmaceuticals Inc. and Standard Drug Co., Inc.