American Society For Quality

American Society for Quality
A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of
Output from a Computer Code
Author(s): M. D. Mckay, R. J. Beckman, W. J. Conover
Source: Technometrics, Vol. 42, No. 1, Special 40th Anniversary Issue (Feb., 2000), pp. 55-61
Published by: American Statistical Association and American Society for Quality
Stable URL: http://www.jstor.org/stable/1271432
Accessed: 29/06/2010 10:02
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=astata.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Statistical Association and American Society for Quality are collaborating with JSTOR to digitize,
preserve and extend access to Technometrics.
http://www.jstor.org
A Comparison of Three Methods for
Selecting
Values of Input Variables in the Analysis of
Output From a Computer Code
M. D. MCKAYAND R. J. BECKMAN W. J. CONOVER
Los Alamos ScientificLaboratory Departmentof Mathematics
P.O.Box 1663 Texas Tech University
Los Alamos,NM87545 Lubbock,TX 79409
Two types of sampling plans are examined as alternatives to simple random sampling in Monte
Carlo studies. These plans are shown to be improvements over simple random sampling with respect
to variance for a class of estimators which includes the sample mean and the empirical distribution
function.
KEY WORDS: Latin hypercube sampling; Sampling techniques; Simulation techniques; Variance
reduction.
1. INTRODUCTION to comparing the three methods with respect to their per-

Numerical methods have been used for years to provide formance in an actual computer code.
The computer code used in this paper was developed
approximate solutions to fluid flow problems that defy ana-
in the Hydrodynamics Group of the Theoretical Division
lytical solutions because of their complexity. A mathemati-
cal model is constructed to resemble the fluid flow problem, at the Los Alamos Scientific Laboratory, to study reac-
and a computer program (called a "code"), incorporating tor safety (Hirt and Romero 1975). The computer code is
methods of obtaining a numerical solution, is written. Then named SOLA-PLOOP and is a one-dimensional version of
for any selection of input variables X = (X,..., XK) an another code SOLA (Hirt, Nichols, and Romero 1975). The
code was used by us to model the blowdown depressuriza-
output variable Y = h(X) is produced by the computer
code. If the code is accurate the output Y resembles what tion of a straight pipe filled with water at fixed initial tem-
the actual output would be if an experiment were performed perature and pressure. Input variables include: X1, phase
under the conditions X. It is often impractical or impossi- change rate; X2, drag coefficient for drift velocity; X3, num-
ber of bubbles per unit volume; and X4, pipe roughness. The
ble to perform such an experiment. Moreover, the computer
codes are sometimes sufficiently complex so that a single input variables are assumed to be uniformly distributedover
set of input variables may require several hours of time on given ranges. The output variable is pressure as a function
of time, where the initial time to is the time the pipe rup-
the fastest computers presently in existence in order to pro-
tures and depressurization initiates, and the final time tl is
duce one output. We should mention that a single output
20 milliseconds later. The pressure is recorded at 0.1 milli-
Y is usually a graph Y(t) of output as a function of time,
second time intervals. The code was used repeatedly so that
calculated at discrete time points t, to < t < tl.
the accuracy and precision of the three sampling methods
When modeling real world phenomena with a computer could be compared.
code one is often faced with the problem of what values
to use for the inputs. This difficulty can arise from within
the physical process itself when system parameters are not 2. A DESCRIPTION OF THE THREE METHODS
constant, but vary in some manner about nominal values. USED FOR SELECTINGTHE VALUES
We model our uncertainty about the values of the inputs OF INPUT VARIABLES
by treating them as random variables. The information de- From the many different methods of selecting the values
sired from the code can be obtained from a study of the of input variables, we have chosen three that have consid-
probability distribution of the output Y(t). Consequently, erable intuitive appeal. These are called random sampling,
we model the "numerical" experiment by Y(t) as an un- stratified sampling, and Latin hypercube sampling.
known transformation h(X) of the inputs X, which have a Random Sampling. Let the input values XI,..., XN be
known probability distribution F(x) for x c S. Obviously a random sample from F(x). This method of sampling is
several values of X, say XI,..., XN, must be selected as perhaps the most obvious, and an entire body of statistical
successive inputs sets in order to obtain the desired infor- literature may be used in making inferences regarding the
mation concerning Y(t). When N must be small because distribution of Y(t).
of the running time of the code, the input variables should
be selected with great care.
? 1979 American Statistical Association
The next section describes three methods of selecting and the American Society for Quality
(sampling) input variables. Sections 3, 4 and 5 are devoted TECHNOMETRICS,FEBRUARY2000, VOL. 42, NO. 1
55
56 M. D. MCKAY,R. J. BECKMAN,AND W. J. CONOVER
StratifiedSampling. Using stratifiedsampling,all areas the followingtheorem,provedin the Appendix,relatesthe

of the samplespace of X are representedby inputvalues. variances of TL and TR.
Let the samplespace S of X be partitionedinto I disjoint
strata Si. Let pi = P(X E Si) representthe size of Si. Theorem. If Y = h(X1,... XK) is monotonic in each
Obtain a random sample Xij, j = 1,..., ni from Si. Then of its arguments,and g(Y) is a monotonicfunctionof Y,
of course the ni sum to N. If I = 1, we have random then Var(TL) < Var(TR).
samplingover the entiresamplespace. 2.2 The SOLA-PLOOP
Latin Hypercube Sampling. The same reasoning that led Example
to stratifiedsampling,ensuringthat all portionsof S were The three sampling plans were compared using the
sampled,could lead further.If we wish to ensurealso that SOLA-PLOOPcomputercode with N = 16.Firsta random
each of the input variablesXk has all portionsof its dis- sample consisting of 16 values of X = (X1, X2,X3, X4)
tributionrepresentedby input values, we can divide the was selected,enteredas inputs,and 16 graphsof Y(t) were
range of each Xk into N strataof equal marginalproba- observedas outputs.These outputvalues were used in the
bility 1/N, and sample once from each stratum.Let this estimators.
sample be Xkj, j = 1,..., N. These form the Xk compo- For the stratifiedsamplingmethodthe rangeof each in-
nent, k = 1,..., K, in Xi, i = 1,..., N. The components put variablewas divided at the medianinto two parts of
of the variousXk's are matchedat random.This method equalprobability.The combinationsof rangesthus formed
of selectinginputvaluesis an extensionof quotasampling produced24 = 16 strataSi. One observationwas obtained
(Steinberg1963), and can be viewed as a K-dimensional at randomfrom each Si as input,and the resultingoutputs
extensionof Latinsquaresampling(Raj 1968). were used to obtainthe estimates.
One advantageof the Latin hypercubesample appears To obtainthe Latinhypercubesamplethe rangeof each
when the output Y(t) is dominatedby only a few of input variableXi was stratifiedinto 16 intervalsof equal
the componentsof X. This method ensures that each of probability,andone observationwas drawnat randomfrom
those componentsis representedin a fully stratifiedman- eachinterval.These 16 valuesfor the 4 inputvariableswere
ner, no matter which componentsmight turn out to be matchedat randomto form 16 inputs,and thus 16 outputs
important. from the code.
Wementionherethatthe N intervalson the rangeof each The entire process of samplingand estimatingfor the
componentof X combine to form NK cells which cover three selectionmethodswas repeated50 times in orderto
the sample space of X. These cells, which are labeledby get someideaof the accuraciesandprecisionsinvolved.The
coordinatescorrespondingto the intervals,are used when total computertime spent in runningthe SOLA-PLOOP
findingthe propertiesof the samplingplan. code in this study was 7 hours on a CDC-6600.Some of
the standarddeviationplots appearto be inconsistentwith
2.1 Estimators the theoreticalresults.These occasionaldiscrepanciesare
In the Appendix(Section8), stratifiedsamplingandLatin believedto arisefrom the non-independence of the estima-
hypercubesamplingareexaminedandcomparedto random tors over time and the small samplesizes.
samplingwith respectto the class of estimatorsof the form
3. ESTIMATINGTHE MEAN
N
The goodnessof an unbiasedestimatorof the meancan
T(Y1,..., YN) = (1/N) g(Yi), be measuredby the size of its variance.For each sampling
i=l
whereg(.) = arbitraryfunction. method,the estimatorof E(Y(t)) is of the form
If g(Y) = Y then T representsthe samplemean which is N
used to estimate E(Y). If g(Y) = yr we obtain the rth Y(t) = (l/N) E Yi(t) (3.1)
sample moment. By letting g(Y) = 1 for Y < y, 0 other- i=l
wise, we obtainthe usualempiricaldistributionfunctionat
the point y. Ourinterestis centeredaroundthese particular where
statistics.
Let T denotethe expectedvalueof T when the Yt'scon- Yi(t) = h(X), i=1,...,N.
stitutea randomsamplefromthe distributionof Y = h(X). In the case of the stratifiedsample,the Xi comes from
We show in the Appendix that both stratifiedsampling stratum
Si, pi = 1/N and ni = 1. For the Latin hypercube
and Latin hypercubesamplingyield unbiasedestimators
sample,the Xi is obtainedin the mannerdescribedearlier.
of r. Each of the three estimators
If TR is the estimateof T from a randomsampleof size estimatorof YR,Ys, and YL is an unbiased
E(Y(t)). The variancesof the estimatorsare
N, and Ts is the estimatefrom a stratifiedsampleof size given in (3.2):
N, then Var(Ts) < Var(TR)when the stratifiedplan uses
equal probabilitystratawith one sample per stratum(all Var(Y(t)) = (1/N)Var(Y(t))
pi = 1/N and nij = 1). No direct means of comparing the
varianceof the correspondingestimatorfrom Latinhyper- N
cube sampling,TL, to Var(Ts) has been found. However, Var(Ys(t)) = Var(YR(t)) - (1/N2) (pi - ,)2
i=l
TECHNOMETRICS,FEBRUARY2000, VOL. 42, NO. 1
A COMPARISONOF THREE METHODS FOR SELECTING VALUES OF INPUT 57
140-0 - 150-0
ANDOM ............ RANDOM

CY 120-0 -
0O STRATI ......... 0 STRATIFIED
I- I-
LAT -- 100-0 LATIN
2 2
100*0-
V) V)
(n
LJ6
LA. La.
0 80.0 -
50-0
zLLJ z
<
w LLJ
60-0 - ~E
2
I I
AI.vi"-(n
-V A -.
0-0 5-0 1I0 1-020
15.0 20-0
TIME TIME
Figure 1. Estimating the Mean: The Sample Mean of YR(t), Ys(t), Figure 3. Estimating the Variance: The Sample Mean of S2 (t), S (t),
and YL(t). and S2 (t).
Var(YL(t)) = Var(YR(t)) + ((N - 1)/N) YL(t) clearly demonstrates superiority as an estimator in

- )(tj - t) (3.2) this example, with a standarddeviation roughly one-fo[u]rth
1/(NK(N- I)K)) E (i that of the random sampling estimator.
R
where p = E(Y(t)), ESTIMATINGTHE VARIANCE

4.
For each sampling method, the form of the estimator of
pi = E(Y(t)lX E Si) in the stratified sample, or the variance is
Pi - E(Y(t)lX e cell i) in the Latin hypercube N
sample, S2(t) - (1/N)Y (Y(t)- Y(t))2, (4.1)

i=l
and R means the restricted space of all pairs ,ui, /j having and its expectation is
no cell coordinates in common.
For the SOLA-PLOOP computer code the means and E(S2(t)) Var(Y(t)) - Var(Y(t)), (4.2)
standard deviations, based on 50 observations, were com-
where Y(t) is one of YR(t),Ys(t), or YL(t).
puted for the estimators just described. Comparative plots
of the means are given in Figure 1. All of the plots of the In the case of the random sample, it is well known that
means are comparable, demonstrating the unbiasedness of NS2/(N- 1) is an unbiased estimator of the variance of
the estimators. Y(t). The bias in the case of the stratified sample is un-
known. However, because Var(Ys(t)) < Var(YR(t)),
Comparative plots of the standard deviations of the es-
timators are given in Figure 2. The standard deviation of (1- 1/N)Var(Y(t)) < E(S2(t)) < Var(Y(t)). (4.3)
Ys(t) is smaller than that of YR(t) as expected. However,
2-5 - 50'0 -
RANDOM . i' RANDOM

20 - 40-0 - ii
or STRATIFIED
STRATFIED ---------- I: I 1'
0O
I- LATIN
LATIN
I- 1 :,
2 30-0 - 1
1-5 -
I- V) 1 '
(C) 1 '
Lj t1 '
: '' LA- 20-0 -

La. 1.0 - :, S i 0
0
i/.@:
.,
,;
:t .
O
c5
?f\?
0 u' 10-0 -
0-5 - I
I
a
,.,
n.n ' "r,
-1.
a -I
..- '~.....~-
/- --t--
-"--~....._.._
' - ....~`
0'0 0.0 5.0 10-0 15-0
15.0 20
20.0
0.0 5-0 10-0 15-
15.0 20-0
TIME
TIME
Figure 4. Estimating the Variance: The Standard Deviation of S2(t),
Figure 2. Estimating the Mean: The Standard Deviation of YR(t),
Ys (t), and YL(t). Ss(t), and S2(t).

:
1.0 - o- 2.1, the expectedvalue of G(y, t) underthe threesampling

plansis the same,andunderrandomsampling,the expected
RANDOM value of G(y, t) is D(y, t).
cr 0.8 - The variancesof the three estimatorsare given in (5.2).
0 STRATIFID .......- I
4
I- LATIN -- Di againrefersto eitherstratumi or cell i, as appropriate,
0*6 - andR representsthe samerestrictedspaceas it did in (3.2).
2
V)
L&J /
La. / Var(GR(y, t)) = (1/N)D(y, t)(l - D(y, t))
0 0-4 -
/ Il
z Var(Gs(y, t)) = Var(GR(y, t))
LLJ
:2 0-2 - // N
... ..../"
"
- (1/N2) (D (y, t)- D(y, t))2
"_' t=l
0'0 lm . . .. - k
20-0 30-0 40-0 50-0 60-0 70-0 I80

80.0 990'0
PRESSURE Var(GL(y, t)) = Var(GR(y, t))
Figure 5. Estimating the CDF: The Sample Mean of GR(Y, t), Gs(y, + ((N - 1)/N. 1/NK(N - 1)K) E (Di(y,t)
t), and GL(y, t) at t = 1.4.
R
The bias in the Latinhypercubeplan is also unknown,but - D(y, t)). (Dj(y, t) - D(y, t)). (5.2)
for the SOLA-PLOOPexampleit was small.Variancesfor
these estimatorswere not found. As with the cases of the mean and varianceestimators,
Againusingthe SOLA-PLOOPexample,meansandstan- the distributionfunctionestimatorswere comparedfor the
darddeviations(basedon 50 observations)were computed. three samplingplans. Figures5 and 6 give the means and
The mean plots are given in Figure 3. They indicatethat standarddeviationsof the estimatorsat t = 1.4 ms. This
all three estimatorsare in relative agreementconcerning time point was chosen to correspondto the time of max-
the quantitiesthey are estimating.In termsof standardde- imum variancein the distributionof Y(t). Again the esti-
viations of the estimators,Figure 4 shows that, although mates obtainedfrom a Latin hypercubesample appearto
stratifiedsamplingyields aboutthe same precisionas does be moreprecise in generalthanthe othertwo types of es-
randomsampling,Latinhypercubefurnishesa clearlybet- timates.
ter estimator.
6. DISCUSSION AND CONCLUSIONS
5. ESTIMATINGTHE DISTRIBUTIONFUNCTION We have presentedthree samplingplans and associated
estimatorsof the mean,the variance,andthe populationdis-
The distribution function, D(y,t), of Y(t) = h(X) may
tributionfunctionof the outputof a computercode when
be estimatedby the empiricaldistributionfunction.Theem-
the inputsaretreatedas randomvariables.The firstmethod
piricaldistributionfunctioncan be writtenas is simple randomsampling.The second methodinvolves
N stratifiedsamplingandimprovesuponthe firstmethod.The
G(y, t) = (1/N) u(y- Yi(t)), (5.1) thirdmethodis called here Latinhypercubesampling.It is
i=-i an extensionof quotasampling(Steinberg1963), and it is
where u(z) = 1 for z > 0 and is zero otherwise.Since a firstcousin to the "randombalance"design discussedby
equation(5.1) is of the form of the estimatorsin Section Satterthwaite(1959), Budne (1959), Youdenet al. (1959),
Anscombe(1959),andto the highly fractionalizedfactorial
015 -
designs discussed by Enrenfeldand Zacks (1951, 1967),
~~~~~~~RANDOMWY~~~~~~
. Dempster(1960, 1961), and Zacks (1963, 1964), and to
..........
lattice samplingas discussedby Jessen (1975). This third
0o .T5RA D ---- methodimprovesupon simplerandomsamplingwhen cer-
< o0-0- LAW -- /
tain monotonicityconditionshold, and it appearsto be a
I- .....
*:
_
vG
:- good methodto use for selectingvaluesof inputvariables.
:y,-,
(:,a t\att=4
Eo , . .: .
7. ACKNOWLEDGMENTS
^ ../:- ^ *\ \ Weextenda specialthanksto RonaldK. Lohrding,for his
0. 2''
earlysuggestionsrelatedto this workandfor his continuing
supportand encouragement.We also thankour colleagues
LarryBruckner,Ben Duran,C. Phive,andTomBoullionfor
0-00 /J * I I I I theirdiscussionsconcerningvariousaspectsof the problem,
20'0 30-0 4O?0 s
5-0 0'0 700 0-o0 90-0 and Dave Whitemanfor assistancewith the computer.
PRESSURE This paperwas preparedunderthe supportof the Anal-
Figure 6. Estimating the CDF: The Standard Deviation of GR(y, t), ysis DevelopmentBranch,Division of ReactorSafety Re-
Gs(y, t), and GL(y, t) at t = 1.4. search,NuclearRegulatoryCommission.
A COMPARISONOF THREE METHODS FOR SELECTINGVALUES OF INPUT 59
8. APPENDIX I ni
In the sections that follow we presentsome generalre- Ts =

(pi/ni) E g(Yij),
sults about stratifiedsamplingand Latin hypercubesam- i=l j=l
pling in order to make comparisons with simple random that Ts is an unbiased estimator of r with variance given by
sampling.We move fromthe generalcase of stratifiedsam-
pling to stratifiedsampling with proportionalallocation,
and then to proportionalallocationswith one observation Var(Ts) = (p2/ni)o2. (8.2)
i=l
per stratum.We examineLatinhypercubesamplingfor the
equalmarginalprobabilitystratacase only. The followingresultscan be foundin Tocher(1963).
8.1 Type I Estimators Stratified Sampling with Proportional Allocation. If the
Let X denotea K variaterandomvariablewith probabil- probability sizes, pi, of the strata and the sample sizes,
ni, are chosen so that ni = piN, proportional allocation
ity density function (pdf) f(x) for x E S. Let Y denote a is achieved.In this case (8.2) becomes
univariatetransformationof X given by Y = h(X). In the
contextof this paperwe assume I
= -
Var(Ts) Var(TR) (I/N) EPi(iii - r)2. (8.3)
X f(x),xeS KNOWNpdf i=l
Y = h(X) UNKNOWNbut observable Thus, we see that stratifiedsamplingwith proportionalal-

transformation of X. locationoffersan improvementover randomsampling,and
that the variancereductionis a functionof the differences
The class of estimatorsto be consideredare those of the betweenthe strata
means,i and the overallmean r.
form
Proportional Allocation with One Sample per Stratum.
N
Any stratifiedplan which employs subsampling,ni > 1,
T(Ul,..., iUN)= -(l/N) Eg (ui), (8.1) can be improvedby furtherstratification.When all ni = 1,
t=l
(8.3) becomes
where g(.) is an arbitrary,known function. In particular N
we use g(u) = ur to estimatemoments,and g(u) = 1 for Var(Ts) = Var(TR) - (1/N2) E (i - r)2. (8.4)
u > 0, = 0 elsewhere,to estimatethe distributionfunction. i=l
The samplingschemes describedin the following sec-
tions will be comparedto randomsamplingwith respect 8.3 Latin Hypercube Sampling
to T. The symbol TR denotesT(Y1,..., YN) when the ar- In stratifiedsamplingthe range space S of X can be
gumentsY, ..., YN constitutea randomsampleof Y. The arbitrarilypartitionedto form strata.In Latin hypercube
mean and varianceof TR are denotedby r and 02/N. The samplingthe partitionsareconstructedin a specificmanner
statistic T given by (8.1) will be evaluatedat arguments using partitionsof the rangesof each componentof X. We
arising from stratifiedsamplingto form Ts, and at argu- will only considerthe case wherethe componentsof X are
ments arisingfrom Latinhypercubesamplingto form TL. independent.
The associatedmeans and varianceswill be comparedto Let the ranges of each of the K componentsof X be
those for randomsampling. partitionedinto N intervalsof probabilitysize 1/N. The
Cartesianproductof these intervalspartitionsS into NK
8.2 StratifiedSampling cells eachof probabilitysize N-K. Eachcell canbe labeled
Let the rangespace,S, of X be partitionedinto I disjoint by a set of K cell coordinates mi = (mil,
i2,..., iK)
subsetsSi of size pi = P(X c Si), with where mij is the intervalnumberof componentXj repre-
I
sentedin cell i. A Latinhypercubesampleof size N is ob-
tained from a random selection N of the cells ml,..., mN,
5Pi 1. with the conditionthatfor eachj the set {mij }N is a per-
i=l
mutationof the integers1,..., N. One randomobservation
Let Xij,j = 1,., ni, be a randomsample from stratum is madein each cell. The densityfunctionof X given X c
Si. Thatis, let Xij iid f(x)/pi,j = 1,..., ni, for x E Si, cell i is NKf(x) if x E cell i, zero otherwise. The marginal
but with zero densityelsewhere.The correspondingvalues (unconditional) distributionof Yi(t) is easily seen to be the
of Y are denotedby Yij = h(Xij), andthe stratameansand same as thatfor a randomlydrawnX as follows:
variancesof g(Y) are denotedby
P(Y < y) = P(Yi < ylX E cell q)P(X c cell q)
i = E(g(Yij))- j g(y)(l/pi)f(x) dx all cells q
Si
- =
= E ll N K(x)dx(1/NK)
a-2 Var(g(Yj)) -
S (g(y) )2(1/pi)f(x)dx.
h(x)<y
i I s
It is easyto see thatif we use thegeneralform f(x) dx.

-Jh(x)<y

From this we have TL as an unbiased estimator of T. Furthermore

To arrive at a form for the variance of TL we introduce
indicator variables wt, with NK NK
f 1 if cell i is in the sample E Z Cov(wig(Yi),wjg(Yj))

Wi- l 0 if not. i=1 j=1
i#j
The estimator can now be written as
- EZ E ijE{wiwj} - N-2K+2 EZ ipj (8.11)
NK
i#j i?j
TL = (1/N) z wig(Yi), (8.5)
i=l1 which combines with (8.10) to give
where Yi = h(Xi) and Xi c cell i. The variance of TL is
given by Var(TL) = (1/N)Var(Y) - N-K-1 (t _ )2
i
NK
Var(TL) = (1/N2) Var(wig(Yi)) + (N-K-1 N-2NK)-

2
i=l
NK NK + (N - 1)-K+1NK-1
+ (1/N2) 5 Cov(wig(Yi), Wjg(Yj)). (8.6) R
i=l j=l -2K
jii -N -^ E
EE^.~~.,
ij (8.12)
The following results about the wi are immediate:
1. P
(wi = 1) = (1/NK-1) = E(wi) = E(w2) where R means the restricted space of NK(N - 1)K pairs
Var(wi) (1/NK-1)(1- 1INK-1). ([i, ,j) corresponding to cells having no cell coordinates in
2. If wi and wj correspond to cells having no cell coor- common. After some algebra, and with K
ui = NKT, the
dinates in common, then final form for Var(TL) becomes
E(wiwj) = E(wiw lwwj= O)P(wj = 0)

Var(TL) = Var(TR) + (N - 1)/N[N-K(N - 1)-K
+ E(wiwjlwj = 1)P(wj = 1)
- -
= 1/(N(N- 1))K-1 *E (pi T)(pu T)]. (8.13)
R
3. If wi and wj correspond to cells having at least one
common cell coordinate, then Note that Var(TL) < Var(TR) if and only if
E(iwjw) =0. N -K(N - 1)-K (/i - )(1jt - T) < 0, (8.14)

Now R
Var(wig(Yi)) = E(w2)Var g(Yi) + E2(g(Yi))Var(wt) (8.7) which is equivalent to saying that the covariance between
so that cells having no cell coordinates in common is negative. A
sufficient condition for (8.14) to hold is given by the fol-
NK NK
lowing theorem.
E Var(wig(Yi)) N-K+l E E(g(Yi) i)2
i=l i=l Theorem. If Y = h(X1,..., XK) is monotonic in each
NK of its arguments, and if g(Y) is a monotonic function of Y,
+ (N-K+I1 2 (8.8)
then Var(TL) < Var(TR).
N-2K+2) E
='-1
Proof The proof employs a theorem by Lehmann
where ui = E{g(Y))X e cell i}. Since (1966). Two functions r(x1,..., XK) and s(y1,..., YK) are
said to be concordant in each argument if r and s either
E(g(Y)- i)2 increase or decrease together as a function of xi - yi, with
all xj,j 7 i and yj,j - i held fixed, for each i. Also,
-- NK (g(y) - 7)2f(x) dx + (i -
)2 (8.9) a pair of random variables (X, Y) are said to be nega-
wcelli
tively quadrant dependent if P(X < x, Y < y) < P(X <
we have
x)P(Y < y) for all x,y. Lehmann's theorem states that
5 Var(wig(Yi)) N Var(Y)- N-K+1 E (i _ T)2 if (i) (X1, Y1), (X2, Y2),... (XK, YK) are independent, (ii)
i i (Xi, Y,) is negatively quadrantdependent for all i, and (iii)
X = r(X1,...,XK) and Y = s(Y1,...,YK) are concor-
+ (N-K+1 _ N-2K+2) E 2. (8.10) dant in each argument, then (X, Y) is negatively quadrant
dependent.
A COMPARISONOF THREE METHODS FOR SELECTING VALUES OF INPUT 61
We earlierdescribeda stage-wise process for selecting [Received January 1977. Revised May 1978.]
cells for a Latinhypercubesample,wherea cell was labeled
by cell coordinates mi,..., miK. Two cells (I1,..., IK) and
(ml,..., mK) with no coordinates in common may be se- REFERENCES
lected as follows. Randomlyselect two integers(R11,R21)
withoutreplacementfromthe firstN integers1,..., N. Let
11 = R11 and m1 = R21. Repeat the procedure to obtain Anscombe,F. J. (1959), "QuickAnalysisMethodsfor RandomBalance
(R12,R22), (R13,R23), . .,(R1K, R2K) and let lk = RIk ScreeningExperiments," Technometrics,1, 195-209.
and mk = R2k. Thus two cells are randomly selected and Budne,T. A. (1959),"TheApplicationof RandomBalanceDesigns,Tech-
nometrics, 1, 139-155.
lk 7 mk for k = 1,..., K. Dempster,A. P.(1960),"Random
AllocationDesignsI: OnGeneralClasses
Note that the pairs (Rlk, R2k), k = 1,..., K, are mutually of Estimation Methods," TheAnnals of Mathematical Statistics, 31, 885-
independent. Also note that because P(Rlk < x, R2k < 905.
y) = [xy - min(x,y)]/(n(n - 1)) < P(Rlk < x)P(R2k < (1961), "RandomAllocationDesigns II: ApproximateTheoryfor
Simple Random Allocation," The Annals of Mathematical Statistics, 32,
y), where [.] representsthe "greatestinteger"function,each 387-405.
pair (Rlk, R2k) is negatively quadrantdependent. Ehrenfeld,S., andZacks,S. (1951),"Randomization
andFactorialExper-
Let /ul be the expected value of g(Y) within the cell iments," The Annals of Mathematical Statistics, 32, 270-297.
designated by (I1,..., 1K), and let /2 be similarly defined (1967), "TestingHypothesesin RandomizedFactorialExperi-
for (ml,... Then = ii(R11,R12,... ments," The Annals of Mathematical Statistics, 38, 1494-1507.
,mK). /1 ,R1K) and
Hirt,C. W.,Nichols,B. D., andRomero,N. C. (1975),"SOLA-A Numeri-
A12 -= I(R21, R22, ..., R2K) are concordant in each argu- cal SolutionAlgorithmforTransientFluidFlows,"ScientificLaboratory
ment under the assumptionsof the theorem.Lehmann's ReportLA-5852, Los AlamosNationalLaboratory, NM.
theorem then yields that 1i and /2 are negatively quadrant Hirt,C. W.,andRomero,N. C. (1975),"Application of a Drift-FluxModel
dependent.Therefore, to Flashingin StraightPipes,"ScientificLaboratoryReportLA-6005-
MS, Los AlamosNationalLaboratory, NM.
P(PI1 < X, l2 < y) < P(li1 < x)P(i2 < y). Jessen,RaymondJ. (1975),"SquareandCubicLatticeSampling," Biomet-
rics, 31, 449-471.
Using Hoeffding'sequation Lehmann,E. L. (1966), "SomeConceptsof Dependence,"TheAnnalsof
Mathematical Statistics, 35, 1137-1153.
1+oo r+00
Raj,Des. (1968),SamplingTheory,New York:McGraw-Hill.
Cov(X, Y) = [P(X < x, Y < y) F. E. (1959), "RandomBalanceExperimentation,"
Satterthwaite, Techno-
metrics, 1, 111-137.
- P(X < x)P(Y < y)] dx dy, Steinberg,H. A. (1963), "Generalized
QuotaSampling,"NuclearScience
and Engineering, 15, 142-145.
(see Lehmann(1966)for a proof),we haveCov(/Al,/2) < 0. Tocher,K. D. (1963),TheArtof Simulation,Princeton,NJ:VanNostrand.
Since Var(TL) = Var(TR) + (N - 1)/N Cov(i,Lu2), the Youden,W. J., Kempthorne,O., Tukey,J. W., Box, G. E. P., and Hunter,
J. S. (1959), Discussionof the papersof Messrs. Satterthwaiteand
theoremis proved. Budne, Technometrics, 1, 157-193.
Since g(t) as used in both Sections3 and 5 is an increas- Zacks, S. (1963), "Ona CompleteClass of LinearUnbiasedEstimators
ing functionof t, we can say that if Y = h(X) is a mono- for RandomizedFactorialExperiments," TheAnnalsof Mathematical
tonic function of each of its arguments,Latin hypercube Statistics, 34, 769-779.
(1964), "GeneralizedLeast SquaresEstimatorsfor Randomized
samplingis betterthanrandomsamplingfor estimatingthe Fractional Replication Designs," The Annals of Mathematical Statistics,
mean and the populationdistributionfunction. 35, 696-704.

American Society For Quality

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

American Society For Quality

Uploaded by

Copyright:

Available Formats

American Society for Quality

1. INTRODUCTION to comparing the three methods with respect to their per-

StratifiedSampling. Using stratifiedsampling,all areas the followingtheorem,provedin the Appendix,relatesthe

ANDOM ............ RANDOM

Var(YL(t)) = Var(YR(t)) + ((N - 1)/N) YL(t) clearly demonstrates superiority as an estimator in

where p = E(Y(t)), ESTIMATINGTHE VARIANCE

sample, S2(t) - (1/N)Y (Y(t)- Y(t))2, (4.1)

RANDOM . i' RANDOM

: '' LA- 20-0 -

TECHNOMETRICS,FEBRUARY2000, VOL. 42, NO. 1

58 M. D. MCKAY,R. J. BECKMAN,AND W. J. CONOVER

1.0 - o- 2.1, the expectedvalue of G(y, t) underthe threesampling

20-0 30-0 40-0 50-0 60-0 70-0 I80

In the sections that follow we presentsome generalre- Ts =

sults about stratifiedsamplingand Latin hypercubesam- i=l j=l

Y = h(X) UNKNOWNbut observable Thus, we see that stratifiedsamplingwith proportionalal-

It is easyto see thatif we use thegeneralform f(x) dx.

TECHNOMETRICS,FEBRUARY2000, VOL. 42, NO. 1

From this we have TL as an unbiased estimator of T. Furthermore

f 1 if cell i is in the sample E Z Cov(wig(Yi),wjg(Yj))

Var(TL) = (1/N2) Var(wig(Yi)) + (N-K-1 N-2NK)-

E(wiwj) = E(wiw lwwj= O)P(wj = 0)

E(iwjw) =0. N -K(N - 1)-K (/i - )(1jt - T) < 0, (8.14)

TECHNOMETRICS,FEBRUARY2000, VOL. 42, NO. 1

You might also like