Professional Documents
Culture Documents
An Algorithm For Fast Optimal Latin Hypercube Design of Experiments
An Algorithm For Fast Optimal Latin Hypercube Design of Experiments
Felipe A. C. Viana1
University of Florida, Gainesville, FL 32611, USA
Gerhard Venter2
Stellenbosch University, Matieland, Stellenbosch 7602, South Africa
and
Vladimir Balabanov3
The Boeing Company, Seattle, WA 98204, USA
This paper presents the Translational Propagation algorithm; a new method for
obtaining optimal or near optimal Latin hypercube designs without using formal
optimization. The procedure requires minimal computational effort with results virtually
provided in real time. The algorithm exploits patterns of point locations for optimal Latin
criterion). Small building blocks, consisting of one or more points each, are used to recreate
these patterns by simple translation in the hyperspace. Monte Carlo simulations were used
to evaluate the performance of the new algorithm for different design configurations where
both the dimensionality and the point density were studied. The proposed algorithm was also
compared against three formal optimization approaches (namely random search, genetic
algorithm, and enhanced stochastic evolutionary algorithm). It was found that (i) the
distribution of the φp values tends to lower values as the dimensionality is increased; and (ii)
1
(Corresponding author) Research Assistant, Dept. of Mechanical and Aerospace Engineering, fchegury@ufl.edu.
2
Professor, Dept. of Mechanical and Mechatronic Engineering, gventer@sun.ac.za.
3
Structural Analysis Engineer, v.o.balabanov@gmail.com.
1
I. Introduction
Design optimization usually requires a large number of potentially expensive simulations. Advancements in
computational hardware and algorithms have not alleviated much of the resulting computational crunch because of
the ever increasing appetite for improved modeling of physical processes and more detailed optimization [1]. To
reduce the computational cost, surrogate models, also known as meta-models, are often used in place of the actual
simulation models [2]-[7]. Surrogate-based design optimization begins by identifying locations in the design space
where simulations will be conducted. This process of identifying locations in the design space is known as design of
experiments (DOE) [8], [9]. Response data (often via numerical simulations) is collected at these locations and one
or more candidate surrogate models are fitted to the data [10]-[12]. Finally, one or more of the candidate models are
selected for calculating responses (and to facilitate objective and constraint calculation during the optimization
process) at points in the design space where the actual responses are not yet available.
It is well known among designers that the quality of fit, which often defines the performance of the surrogate
model during optimization and design space exploration strongly depends on the design of experiments (point
location and density) [13], [14]. By quality of fit, the authors imply the discrepancy (in a general sense) between the
actual response and the value predicted by the corresponding surrogate model. There exist many measures for the
quality of fit, depending on the particular problem under consideration [14], [15].
T
X = x1, x2 , …, xn , where each row xi = x i1, x i 2 , …, x in represents a sample and each column
p v
represents a variable. Within the design and analysis of computer experiments, the Latin hypercube design (LHD)
proposed by McKay et al. [16] and Iman and Conover [17], is very popular. The LHD presents advantages such as:
(i) the number of samples (points) is not fixed; (ii) orthogonality of the sampling points (a design is orthogonal if the
inner product of any two columns is zero [3], [18]); and (iii) the sampling points do not depend on the surrogate
model that will be constructed. A LHD with n p points is constructed in such a way that each of the nv variables is
divided into n p equal levels and that there is only one point (or sample) at each level. A random procedure is used
to determine the point locations. Figure 1 shows two examples of LHDs with nv = 2 and n p = 16 . Since the LHD
is constructed using a random procedure, there is nothing preventing a design that has poor space filling qualities, as
2
the extreme case illustrated in Figure 1-(a). A better choice is shown in Figure 1-(b), where the points are more
(a) Ill-suited LHD with nv = 2 and n p = 16 . (b) Reasonable LHD with nv = 2 and n p = 16 .
Figure 1: Examples of Latin hypercube designs.
The optimization of the space-filling qualities of the Latin hypercube design is a challenging problem that has
resulted in a number of research publications [19]-[28]. One interpretation of the space filling property is to consider
a nv -dimensional sphere around each design point in the experimental design. The larger the radius of the smallest
sphere that does not cross the boundary of the design space, the better the space filling property of the design.
Optimizing the point location in a Latin hypercube design to improve the uniformity of the point distribution,
typically by maximizing the radius of the smallest sphere, results in an Optimal Latin hypercube design. Such
designs are usually obtained from time consuming combinatorial optimization problems, with search space of the
n
order of ( n p ! ) v . For example, to optimize the location of 20 samples in 2 dimensions, the algorithm has to select
the best design from more than 1036 possible designs. If the number of variables is increased to 3, the number of
possible designs is more than 1055. Researches have proposed various optimization algorithms and objective
functions to solve this problem. Table 1 summarizes some strategies found in the literature. As for the computational
time, Ye et al. [22] reported several hours on a Sun SPARC 20 workstation for generating an optimal Latin
hypercube with 25 points for 4 variables. Jin et al. [25] reported minutes on a PC with a Pentium III 650 MHZ CPU
for generating an optimal Latin hypercube with 100 points for 10 variables. Section IV shows a comparison of
different optimization methods for obtaining optimal LHDs with contemporary computing power.
Due to its popularity, the φp criterion was selected as a performance measure in this paper. Minimizing φp
leads to the maximization of the point-to-point distance in the design (see [21] and [25] for details). Mathematically:
3
1
n p −1 n p p
φp = ∑ ∑ dij−p , (1)
i =1 j =i +1
where p is a positive integer, n p is the number of points in the design and dij is the inter-point distance between
all point pairs in the design. The general inter-point distance between any point pair xi and x j can be expressed as
follows:
1
nv t
t
dij = d ( xi , x j ) = ∑ x ik − x jk
, t = 1 or 2 . (2)
k =1
In the present work, p = 50 and t = 1 are used following the suggestions of Jin et al. [25].
Figure 1-(b) shows what point locations one can expect to obtain if the φp criterion is used to generate a LHD
with nv = 2 and n p = 16 (although other criteria may also lead to the same distribution of points).
In this work, the Translational Propagation algorithm is presented for obtaining optimum or near optimum Latin
hypercube designs. This algorithm requires minimal computational effort and does not use formal optimization. The
aim is to solve the optimization problem in an approximate sense, i.e., to obtain a good Latin hypercube quickly,
rather than finding the best possible solution. The algorithm exploits simple translation of small building blocks
(consisting of one or more points) in the hyperspace. The obtained LHD is referred to as TPLHD (a Latin hypercube
4
design obtained via the Translational Propagation algorithm). In general, the obtained TPLHD could be useful by
themselves as optimum or near optimum Latin hypercube designs, or as an initial guess for numerical
The rest of the paper is organized as follows. Section II describes the translational propagation algorithm for
obtaining Latin hypercube designs. Section III describes the numerical experiments used in this study. Section IV
presents the results and discussion. Finally, the paper is closed by recapitulating salient points and concluding
remarks.
A. Basic Algorithm
The proposed approach is based on the idea of constructing the optimal nv -dimensional Latin hypercube design
from a fairly small nv -dimensional seed design. The simple example of a 16 × 2 (i.e., sixteen points in two
dimensions) Latin hypercube design is used to explain the methodology. Figure 2 shows some possible two-
dimensional seed designs. While there is no limitation on the number of points for the seed design, the seed design
can be as simple as just a single point ( 1 × nv design). For simplicity, the seed of Figure 2-(a) is seed used in the
(a) 1x2 seed design. (b) 2x2 seed design. (c) 3x2 seed design. (d) 4x2 seed design.
Figure 2: Example of seed designs for 2 variables.
In order to construct a Latin hypercube design of n p points from a seed design of ns points, the design space is
np
nb = . (3)
ns
Equation (3) means that each dimension is partitioned into the same number of divisions, nd , calculated by:
5
1
nd = ( nb )nv . (4)
In the example of the 16 × 2 Latin hypercube design (i.e. n p = 16 and nv = 2 ), considering ns = 1 (seed
design selected from Figure 2-(a)), one obtains nb = 16 and nd = 4 from Eqs. (3) and (4).
n p = 16
nv = 2
ns = 1
nb = 16
nd = 4
Figure 3: 16 × 2 Latin hypercube mesh divided into blocks (4 divisions in each dimension results in 16
blocks). The left-lower block is the first one to be picked in the algorithm.
Next, each block is filled with the previously selected seed design. Figure 3 shows the division of the design
space for the 16 × 2 Latin hypercube design and which of the blocks is the first one to be picked in the algorithm.
Figure 4 illustrates the process step by step. First, the seed design is properly scaled and placed at the origin of the
first block, as shown in Figure 4-(a). Next, the block with the seed design is iteratively shifted by n p nd levels in
one of the dimensions. Every time that the seed design is shifted, a new point is added to the experimental design.
Figure 4-(b) shows the first shift of the seed design (chosen to be in the horizontal direction). To preserve the Latin
hypercube property of only a single point per level; Figure 4-(b) shows that there also has to be a one-level shift in
the vertical direction. In the general case, a displacement vector is built for each accounting for the shifting in the
dimension of interest (horizontal direction in the example) as well a shift in all other dimensions to preserve the
Latin hypercube properties (vertical direction in our example). The shifting process continues in one of the
dimensions until all the divisions in this dimension are filled with the seed design as illustrated in Figure 4 (a-d). In
the next step, the current set of points (newly filled division) is used as a new seed design and the procedure of
6
shifting the seed design is repeated in the next dimension. Figure 4 (e-g) illustrates the shifting procedure in the
vertical direction.
placement of the seed. Figure 4-(b) to (d) shows the translation of the seed in the horizontal direction (which
is accompanied by a one-level vertical displacement to preserve Latin hypercube properties). Figure 4-(d) also
represents the newly created “seed” that will be translated in the vertical direction. Figure 4-(e) to (g)
illustrates the translation in the vertical direction (which is accompanied by horizontal displacement of one
level).
The biggest advantage of this approach is that there are no calculations to perform. All operations can be viewed
as a simple translation of the seed designs in the nv -dimensional hypercube. Although efficient for generating large
designs, the algorithm proposed up to now fails to provide flexibility for the total number of points in the final Latin
hypercube design. The approach presented so far is limited in the sense that Eqs. (3) and (4) must hold. The next
section describes a strategy to overcome this limitation and generate designs with arbitrary number of points.
at least the required number of points using the algorithm described above. If this design contains the required
number of points, the process is completed. Otherwise, an experimental design larger than the required is created
7
and a resizing process is used to reduce the number of points to the desired one. The points are removed one-by-one
from the initially created TPLHD by discarding the points that are the furthest from the center of the hypercube and
reallocating remaining points to fill the whole design (preserving the Latin hypercube properties). In the proposed
algorithm removing the points furthest from the center does not reduce the area of exploration. After removing the
points, the final design is rescaled to cover the whole design space.
To illustrate the approach, consider the example where a 12 × 2 Latin hypercube is created using the one point
1 nv
seed of Figure 2-(a) (i.e., ns = 1 ). From Eqs. (3) and (4), nd = n p ≅ 3.46 does not result in an integer number
in this case. Rounding nd ≅ 3.46 up would give nd = 4 and the next largest design that can be constructed is the
16 × 2 , as illustrated in Figure 4. The resizing process begins with first calculating the distance between each of the
sixteen points and the center of the design space. To create a 12 × 2 design out of a 16 × 2 one, the four points
furthest from the center have to be removed. In practice, this means the points of the original TPLHD have to be
ranked according to the distance from the center of the design space. The suggested resizing algorithm, computes
these distances only one time, during the very first iteration. When a point is removed, the levels occupied by its
projection along each of the dimensions have to be eliminated. This ensures the Latin hypercube property that only a
single point is found at any of the levels. Figure 5 illustrates the resizing process step by step. The number of points
in the design actually shrinks but the final design still represents samples over the same design space. Figure 5-(a)
shows that in the 16 × 2 design the points in the left-bottom and right-top corners are equally far from the center.
Due to symmetry, it is not important which of the points will be removed first. The top-right point is removed first,
Figure 5-(a); and the bottom-left one is removed next, Figure 5-(b). Figure 5-(c) might be the best illustration of how
the algorithm guarantees the Latin hypercube properties because it is the first time that an internal level is removed.
Removing a point is as simple as eliminating it from the experimental design. However, as illustrated in Figure 6,
this would leave empty levels (breaking the Latin hypercube requirements). The correct implementation of the
resizing algorithm (see Figure 5-(c)) takes care of this limitation by also eliminating the levels that the point used to
occupy. In Figure 5-(c) this means that the three points on the right would move to the left (occupying the empty
level that was in between the points). Next, the remaining points are scaled to cover the original design space. The
scaling is as simple as the mapping of the remaining points to the design space of interest such that the lower and
upper bounds are sampled by one of the points of the design (i.e., simple unidirectional scaling of all points). After
8
each step, a new near optimum Latin hypercube is obtained with one point less. The process continues until the
12 × 2 design is achieved. Removing points/levels part of the algorithm reduces the number of points of the
experimental design, while preserving the Latin hypercube requirements. On the other hand, the resizing part makes
(a) Step 1.
(b) Step 2.
(c) Step 3.
(d) Step 4.
Figure 5: Process of creating a 12 × 2 enhanced Latin hypercube design from a 16 × 2 design.
C. Seed designs
Using the described algorithm, one can create different instances of the TPLHD by changing the seed design, as
illustrated by Figure 7 for the 16 × 2 TPLHD. As it is not known beforehand which seed design will lead to the best
9
design in terms of the φp criterion, one would not know what seed design to use. However, this might be a positive
feature of the algorithm. First, it is possible that the different designs are comparably good (when looking at the φp
criterion). Second, due to the low cost of the algorithm (as we will show in the next sections), it pays to create
instances of the Latin hypercube with different seeds and then pick the best one according to the φp criterion.
(a) From 1-point seed. (b) From 2-points seed. (c) From 3-points seed. (d) From 4-points seed.
Figure 7: 16 × 2 TPLHD obtained with different seeds.
Because of the shifting process, the number of levels in a block is greater than the number of points in the seed
design. This means that when creating design starting from seeds with more than one point ( ns > 1 ) it is necessary
to reshape them to fit into one block of the TPLHD. Figure 8 illustrates this idea with the design of a 16 × 2
TPLHD starting from a 4 × 2 seed (i.e., ns = 4 ). Figure 8-(a) shows the initial 4 × 2 seed (that respects the Latin
hypercube properties). It is necessary to insert as many empty levels between two consecutive projections of points
as the number of division in a given directions, as shown in Figure 8-(b). This way, at the end of the shifting
10
process, there will be only one point per level (see Figure 8-(c)). The next section reviews the implementation of
(b) Seed reshaped to fit in one block (c) Seed design filling different
(a) Original seed.
of the TPLHD. blocks of the TPLHD.
Figure 8: Illustration of the reshaping of the seed design.
(due to p = 50 and t = 1 in Eq. (1)). Good Latin hypercube designs are expected to be obtained because the
minimum distance is fixed at the level of seed design. This is particularly clear for large number of points in low
dimensions. Figure 9 illustrates the Translational Propagation algorithm. The input parameters are the initial seed
design s , the number of points in the seed design, ns , number of points of the desired Latin hypercube design, n p ,
and the number of variables, nv . n p∗ and nd∗ are the number of points and number of divisions of the first TPLHD
to be created. While n p can assume any value, n p∗ is such that Eq. (3) returns an integer number of blocks nb . The
first step is to check whether or not a bigger experimental design is needed. The number of divisions, nd , and its
rounded up value, nd∗ , are compared. If nd∗ > nd , Eqs. (3) and (4) does not hold and the number of points in the
TPLHD to be created, n p∗ , is greater than the desired, n p . Next, the seed design is reshaped to fit in one block of the
TPLHD (Section II.C), and a TPLHD of n p∗ points is created (Section II.A). Finally, if n p∗ > n p , the TPLHD
previously obtained is resized such that it will have n p points (Section II.B). X is the final TPLHD (with n p
points).
Appendix A gives the MATLAB implementation of the different building blocks of the translational propagation
algorithm.
11
III. Numerical Experiments
To illustrate the effectiveness of the proposed TPLHD, a set of configurations that covers the typical application
range of the LHD is considered. The experimental designs vary from only two to twelve variables as summarized in
Table 2. For each number of variables, three cases representing small, medium and large number of design points
were considered. The number of design points was calculated relative to the number of coefficients in a full
polynomial model for the specified number of variables. The small designs have two times more points than the
number of coefficients in a full quadratic polynomial model, while the large designs have twenty times more points
than the number of coefficients in the same model. The medium designs have two times more points than the
12
Table 2: Latin hypercube design configurations considered.
No. of No. of points
variables Small designs Medium designs Large designs
2 12 20 120
4 30 70 300
6 56 168 560
8 90 330 900
10 132 572 1320
12 182 910 1820
The influence of the initial seed used to construct the TPLHD was studied first. For this study, different TPLHDs
were generated using seeds with one to five points. The designs were then compared using the values of the φp
criterion. Next, the efficiency of the proposed algorithm in approximating the optimal LHD was studied. For this
study, an estimate of the range of the φp criterion for the LHDs proposed in Table 2 is created based on Monte
Carlo simulation plus the results from 100 simulations of three different Latin Hypercube optimization techniques.
The Monte Carlo simulation created 200,000 random LHDs for the configurations proposed in Table 2 (this is
possible because the Latin hypercube algorithm allows for the creation of different designs based on a random
number generator). The three different Latin Hypercube optimization techniques used in the study are:
1. The Enhanced Stochastic Evolutionary algorithm (ESEA) of Jin et al. [25]: set with maximum number of
twenty iterations (but each simulation was allowed to run at most five iterations without improvement of
2. The Genetic algorithm (GA) implementation of Bates et al. [24]: set with maximum number of fifty
iterations (but each simulation was allowed to run at most twenty iterations without improvement of the
objective function).
3. The native MATLAB function lhsdesign [30]: using the maxmin criterion (maximization of the minimum
distance) and two hundred iterations. With these settings, the MATLAB lhsdesign function selects the best
LHD from two hundred randomly created designs. The maxmin criterion is used to select the best design.
The settings for both GA and ESEA were based on a few trials for the design with 560 points and six variables.
All simulations were conducted using an Intel Core 2 Quad CPU Q6600 at 2.40GHz, with 3GB of RAM running
MATLAB 7.6 (R2008a) under Windows XP. The SURROGATES toolbox [29] was used to execute the TPLHD,
13
Instead of showing only the φp criterion as defined in Eq. (1), a normalized version, φp , is also presented for
easier comparison:
φp − min ( φp )
φp = , (5)
max ( φp ) − min ( φp )
where max ( φp ) and min ( φp ) are the maximum and minimum values of φp found in the generated DOEs
(including both the TPLHDs and the Monte Carlo simulations); which means that 0 ≤ φp ≤ 1 .
Using the Monte Carlo simulations and the optimization results, it is possible to estimate the range of variation
of φp values and thus make a judgment if a specific design represents a substantial improvement or not. The φp
criterion makes the comparison of the distributions of different designs easier. Although it eliminates the sense of
absolute magnitude (because of the normalization), it permits identifying if the distributions are towards the low or
Table 3 shows the values of the φp criterion for TPLHDs obtained using different initial seed designs for design
configurations of Table 2. From these results, it can be concluded that no single seed size is always the best (even
considering designs with the same number of variables). For example, the single point seed is the best for the
12 × 2 and 20 × 2 designs, while the two-point seed is the best for the 120 × 2 design. However, there might be
cases where different seeds are equally competitive. As illustration, this is the case for the 30 × 4 design, where the
four-point and five-point seeds produce designs with nearly the same φp criterion. These two observations together
with the fact that no time consuming optimization is required in the translational propagation algorithm suggests that
it is possible to create several instances of the TPLHDs (based on different initial seeds) and select the design of
choice based on the φp criterion. Table 3 also shows that an increase in dimensionality seems to favor the single
point seed. The reason is that in higher dimensions a seed with multiple points acts like a cluster of points, hurting
14
Table 3: φp criterion, defined in (1), ( p = 50 , t = 1 ) for TPLHD constructed from different seeds (the best
Table 4 shows detailed statistics regarding the φp criterion with data obtained during the Monte Carlo and
optimization simulations (the 5th percentile is added to give information about the lower values of φp ). For up to six
variables, TPLHD delivers exceptionally good designs (within five percent of the best designs for most of the cases).
For eight to twelve variables, TPLHD is not attractive anymore (with designs located within the last quartile of the
φp criterion distribution). The reader will notice that for ten variables the maximum value of φp drops when
coming from 572 to 1320 points (contrarily from the trend of the other design configurations). This is because it is
difficult to estimate this value for high dimensional cases (even with total of more than 200,000 simulations). On the
other hand, the percentile information establishes the trend of increasing φp with the number of points. Since the
algorithm was heuristically developed based on the visual observation of the patterns generated by the optimal LHD
in two and three dimensions according to the φp criterion, it is of no surprise that the efficiency is harmed when the
dimensionality increases. The range of the φp criterion varies with different design configurations and can also be
very large (for example, the range for the 120 × 2 design is 52.9). Thus, only the normalized φp criterion, φp , is
15
Table 4: TPLHD and statistics about the φp criterion, defined in (1), ( p = 50 , t = 1 ) for each studied
configuration. Bold faces show when TPLHD offers the best design. TPLHD performs very well up to 6
dimensions.
Figure 10 illustrates the box plots of the φp criterion for each of the studied configurations. A box plot shows
the full data range on the y-axis. A box is defined by lines at the lower quartile (25%), median (50%), and upper
quartile (75%) values. Lines extend from each end of the box for a distance of 1.5 times the inter-quartile range or
until the data limit is reached. The data points falling outside this 1.5 times the inter-quartile range represent outliers
that are shown using “+” symbols. Figure 10 shows that the distribution of φp tends to lower values as the
dimensionality of the problem grows. It does not necessarily mean that the LHD fills the space better; after all, the
sparsity of the points severely increases with the number of variables. However, this behavior suggests that the
probability of generating a design in the lower levels of the φp increases with the dimensionality. The following
• In low dimensions: the optimum LHD is an outlier located in the lower region of the φp values; making it
difficult to be obtained.
16
• In high dimensions: most of the LH designs are in the lower region of the φp ; making it difficult to
distinguish the optimal LHD (obviously according to this criterion only) and other several good
Figure 10 makes clear that the TPLHD represents a better approximation of the optimum LHD in low to medium
dimensions (up to six variables) and thus we will focus only on the designs with up to six variables.
Table 5 allows a comparison between the TPLHD and the mean out of 100 simulations of three different
optimization techniques: (i) the Enhanced Stochastic Evolutionary algorithm (ESEA) of Jin et al. [25]; (ii) the
Genetic algorithm (GA) implementation of Bates et al. [24]; and (iii) the native MATLAB function lhsdesign [30].
TPLHD was obtained by generating five candidates from different seeds and then picking the best one according to
the φp criterion (which means that Table 5 shows the time needed to generate all five candidates. Appendix B
shows the discussion on the number of points that needs to be allocated in each of the cases; which directly impacts
the computational cost). Results in Table 5 reinforce that TPLHD offers very good solutions; with most of the
designs presenting φp = 0 . This means that TPLHD found the best result of the set of all simulations (including the
Monte Carlo and all optimization ones). As for the computational cost, the TPLHD design is superior in most of the
cases, or in the worst case matches the best time of the traditional optimization techniques. Clearly, the point density
has a dramatic impact on the optimization techniques (especially ESEA and GA). For instance, for the ESEA with
six variables, moving from 56 to 560 points makes the computational cost change from two seconds to a little more
than 18 minutes. The lhsdesign MATLAB function tends to suffer less, since it can be seen as a Monte Carlo
simulation with only 200 samples (200 is the number of iterations that lhsdesign was allowed to run). Still, in six
dimensions, when moving from 56 to 560 points the computational cost of the lhsdesign increases more than 100
times.
17
(a) Low to medium dimensions.
TPLHD. TPLHD represents a good solution in low to medium dimensions. φp ranges from 0 to 1.
Table 5 gives the impression that increasing the number of variables (no matter the point density) made the three
optimization techniques to improve their own performance (in terms of the φp criterion). Considering the φp
criterion of the designs with four and six variables, TPLHD and any of the optimization techniques would offer
designs that are below 0.1. Then, the computational cost would leave only TPLHD and the lhsdesign as competing
strategies. However, a closer look at the distributions of the φp criterion shows that TPLHD is the best choice.
Figure 11 contrasts the box plot of the φp criterion with the values found by TPLHD and the 0.1 threshold of
lhsdesign. Because the distributions tends to lower values, the 0.1 threshold is a bad value for the 300 × 4 design
and is a marginal to undesired value in 6 dimensions. It is clear that TPLHD offers a better design (not mentioning a
faster solution).
18
Table 5: Performance comparison between TPLHD and the median out of 100 simulations of different
optimization techniques. ESEA is the implementation of the Enhanced Stocastic Evolutionary algorithm of
Jin et al. [25]. lhsdesign is a native MATLAB function. GA refers to the Genetic algorithm implementation of
Bates et al. [24]. φp , defined in (5), ranges from 0 to 1.
No. of variables 2 4 6
No. of points 12 20 120 30 70 300 56 168 560
TPLHD 0.1 0.1 0 0 0 0 0.1 0 0
φp ESEA 0.2 0.1 0.1 0 0 0 0 0 0
GA 0.2 0.3 0.3 0.1 0.1 0 0 0 0
lhsdesign 0.5 0.5 0.6 0.1 0.1 0.1 0.1 0. 0.1
TPLHD ≅0 ≅0 0.1 ≅0 ≅0 0.7 ≅0 0.5 2
Time ESEA ≅0 0.2 13 0.3 3 173 2 34 1096
(sec) GA 0.4 0.9 24 4 17 275 19 143 1509
lhsdesign ≅0 ≅0 0.2 ≅0 0.1 1.8 0.1 1.0 13
Figure 11: Boxplots of the φp criterion between 0 and 0.5 for the four and six dimensional designs. φp is
One of the reasons for the poor performance of the TPLHD in high dimensions might be the existence of the
directionality property of TPLHD. Because of the way the points are created in TPLHD, they tend to be stretched
along one direction. Even the optimal LHD (obtained with ESEA or GA for example) from Figure 1-(b) may be
considered as having points placed in a preferred direction (close to 45 degrees). Goel et al. [14] discussed that a
single criterion for generating design of experiments may lead to large deteriorations in other criteria. In the
proposed algorithm, the φp criterion is used and the directionality of the resulting Latin hypercube designs (for both
the TPLHD and the designs obtained with traditional optimization) is a clear loss. There has been work on the
optimization of the space-filling properties of the LHD while preserving low correlation between variables. Cioppa
and Lucas [31] presented an algorithm that improves the space-filling properties of LHD at the expense of inducing
small correlations between the columns in the design matrix. However, authors warned that the approach is
19
computationally prohibitive. Hernandez [32] presented an extensive study on a set of methodologies to create design
matrices with little or no correlation (including saturated nearly orthogonal LHDs). Franco et al. [33] discussed a
radar-shaped statistic for identifying the directionality property in a DOE (especially efficient in low dimension).
However there has been little research on how to employ this statistic for improving DOE and not just for
identifying the directionality in an existing DOE, the directionality statistic was not employed to improve existing
DOE in this paper. Nevertheless, the focus of this research is the translational propagation algorithm as a cheap
alternative to traditional optimization of the Latin hypercube designs based on the φp criterion.
Other, more general questions arise in higher dimensions: it is not clear what is the meaning of space-filling
designs and the relative cost of any optimization strategy of experimental designs. The sparsity of the data may
make it difficult to judge whether designs given by the optimization of the Latin hypercube are in fact space-filling
designs. Even if they are, considering plots such as those in Figure 10, the computational cost of strategies like
TPLHD may end up being very close to random search (such as in lhsdesign). It may happen that in higher
dimensions it is not worth investing much in time consuming optimization (few iterations of random search may
provide satisfactory results). In spite of that, authors intend to investigate the potential of the radar-shaped statistic in
V. Conclusions
A methodology for creating Latin hypercube designs via translational propagation algorithm (TPLHD) is
proposed. The approach is based on the idea that a simple “seed design” with a few points can be used as a building
block to construct a near optimal Latin Hypercube design (LHD). The main advantage of the proposed methodology
is that it requires virtually no computational time. The approach was tested on eighteen different experimental
design configurations with varying dimensionality and point density. Monte Carlo simulations were used to support
the analysis of the algorithm’s performance. For these cases it was found that:
• Based on the Monte Carlo simulations, the probability of randomly generating good designs in terms of the
φp criterion (i.e. designs with low values of φp ) increases with the dimensionality. This does not mean
that the designs will fill the space better, it only means that designs tends to be equally competitive. This
fact makes computationally cheap strategies such as random search attractive in high dimensions.
20
• It was found that for the low dimensional cases (up to six variables) the TPLHD approximates the optimal
LHD (using φp criterion) very well. In higher dimensions (eight to twelve variables), the TPLHD does not
As the time for generating the TPLHD is very small, in small to medium dimensions it is recommended to
generate several TPLHDs (using seed designs with different number of points) and to pick the best one according to
the φp criterion.
Acknowledgments
Authors would like to thank everybody at Vanderplaats Research and Development, Inc. (VR&D) for the
encouragement and support. This work was started when authors were employed by VR&D in various positions.
Appendix
For simplicity, the levels of the Latin hypercube are represented by integer values. This means that each of the
points of the created Latin hypercube design can assume values from one to the number of points. The created Latin
hypercube needs to be properly scaled to the design space of interest. Table 6 gives the main function; which has as
input parameters the number of points, n p , and variables, nv , of the desired Latin hypercube design, as well as the
seed design, s , and its number of points, ns . The points in the seed design initially assume values from one to the
ns . Lines 9 to 17 of the algorithm shown in Table 6 illustrate how to take care of the size of the Latin hypercube to
be created using the translational propagation algorithm. At first, a Latin hypercube observing Eqs. (3) and (4) is
created; even if it means creating a design bigger than required. The seed design is then properly placed into the first
block of to be filled by the algorithm (line 19). Line 43 is the call for the function that creates the initial Latin
hypercube design. If necessary, the design is then resized to the initially set configuration. Table 7 gives the
implementation of the reshaping of the seed design (see Section II.C). Table 8 illustrates the body of the
translational propagation algorithm. Lines 16 to 18 show how to create the displacement vector that is used to
translate the seed in the hyperspace. Finally, Table 9 presents the implementation of the resizing process. Lines 8 to
21
15 show how we reduce the initially large Latin hypercube to the n p initially set. Lines 18 to 30 present how to
restore the Latin hypercube condition of one point per level to the remaining points.
Table 6: MATLAB implementation of the translational propagation algorithm. The created design ranges
from one to the number of points. The algorithm works with integer indexes for each point. ceil ( x ) is the
MATLAB function that rounds the elements of x up to the nearest integers. ones ( m, n ) is the MATLAB
function that creates an m -by- n matrix of ones. See Table 7 for details about reshapeSeed. See Table 8
for details about createTPLHD. See Table 9 for details about resizeTPLHD. vertcat ( A, B ) is the
MATLAB function that performs the vertical concatenation of matrices A and B.
1: function X = tplhsdesign(np, nv, seed, ns)
2: % inputs: np – number of points of the desired Latin hypercube (LH)
3: % nv – number of variables in the LH
4: % seed – initial seed design (points within 1 and ns)
5: % ns – number of points in the seed design
6: % outputs: X – Latin hypercube created using the translational
7: % propagation algorithm
8:
9: % define the size of the TPLHD to be created first
10: nd = ( np/ns )^( 1/nv ); % number of divisions, nd
11: ndStar = ceil( nd );
12: if (ndStar > nd)
13: nb = ndStar^nv; % it is necessary to create a bigger TPLHD
14: else
15: nb = np/ns; % it is NOT necessary to create a bigger TPLHD
16: end
17: npStar = nb*ns; % size of the TPLHD to be created first
18:
19: % reshape seed to properly create the first design
20: seed = reshapeSeed(seed , ns, npStar, ndStar, nv);
21:
22: % create TPLHD with npStar points
23: X = createTPLHD(seed, ns, npStar, ndStar, nv);
24:
25: % resize TPLH if necessary
26: npStar > np;
27: if (npStar > np)
28: X = resizeTPLHD(X, npStar, np, nv);
29: end
30: return
22
Table 7: MATLAB implementation of the reshapeSeed function. ones ( m, n ) is the MATLAB function
that creates an m -by- n matrix of ones. round ( x ) is the MATLAB function that rounds the elements of X to
the nearest integers.
1: function seed = reshapeSeed(seed , ns, npStar, ndStar, nv)
2: % inputs: seed – initial seed design (points within 1 and ns)
3: % ns – number of points in the seed design
4: % npStar – number of points of the Latin hypercube (LH)
5: % nd – number of divisions of the LH
6: % nv – number of variables in the LH
7: % outputs: seed – seed design properly scaled
8:
9: if ns == 1
10: seed = ones(1, nv); % arbitrarily put at the origin
11: else
12: uf = ns*ones(1, nv);
13: ut = ( (npStar / ndStar) - ndStar*(nv - 1) + 1 )*ones(1, nv);
14: rf = uf - 1;
15: rt = ut - 1;
16: a = rt./rf;
17: b = ut - a.*uf;
18: for c1 = 1 : ns
19: seed(c1,:) = a.*seed(c1,:) + b;
20: end
21: seed = round(seed); % to make sure that the numbers are integer
22: end
23:
24: return
23
Table 8: MATLAB implementation of the createTPLHD function. ones ( m, n ) is the MATLAB function
that creates an m -by- n matrix of ones. length ( a ) is the MATLAB function that returns the number of
entries in the vector a . vertcat ( A, B ) is the MATLAB function that does the vertical concatenation of the
matrices A and B .
1: function X = createTPLHD(seed, ns, npStar, ndStar, nv)
2: % inputs: seed – initial seed design (points within 1 and ns)
3: % ns – number of points in the seed design
4: % npStar – number of points of the Latin hypercube (LH)
5: % nd – number of divisions of the LH
6: % nv – number of variables in the LH
7: % outputs: X – Latin hypercube design created by the translational
8: % propagation algorithm
9: % we warn that this function has to be properly translated to other
10: % programming languages to avoid problems with memory allocation
11:
12: X = seed;
13: d = ones(1, nv); % just for memory allocation
14: for c1 = 1 : nv % shifting one direction at a time
15: seed = X; % update seed with the latest points added
16: d(1 : (c1 - 1)) = ndStar^(c1 - 2);
17: d(c1) = npStar/ndStar;
18: d((c1 + 1) : end) = ndStar^(c1 - 1);
19: for c2 = 2 : ndStar % fill each of the divisions
20: ns = length(seed(:,1)); % update seed size
21: for c3 = 1 : ns
22: seed(c3,:) = seed(c3,:) + d;
23: end
24: X = vertcat(X, seed);
25: end
26: end
27: return
24
Table 9: MATLAB implementation of the resizeTPLHD function. ones ( m, n ) is the MATLAB function
that creates an m -by- n matrix of ones. zeros ( m, n ) is the MATLAB function that creates an m -by- n
matrix of zeros. norm ( x ) is the MATLAB function that computes the vector norm ( ∑ ( xi2 ) ). min ( x ) is
the MATLAB function that finds the smallest element in x. sort ( a ) is the MATLAB function that sorts the
vector a in the ascending order, returning the sorted and the index vectors. sortrows ( X, COL ) is the
MATLAB function that sorts the matrix X based on the columns specified in the vector COL .
isequal(A, B) is the MATLAB function that returns logical 1 (TRUE) if arrays A and B are the same size
and contain the same values, and logical 0 (FALSE) otherwise.
1: function X = resizeTPLHD(X, npStar, np, nv)
2: % inputs: X – initial Latin hypercube design
3: % npStar – number of points in the initial X
4: % np – number of points in the final X
5: % nv – number of variables
6: % outputs: X – final X, properly shrunk
7:
8: center = npStar*ones(1,nv)/2; % center of the design space
9: % distance between each point of X and the center of the design space
10: distance = zeros(npStar, 1);
11: for c1 = 1 : npStar
12: distance(c1) = norm( ( X(c1,:) - center) );
13: end
14: [dummy, idx] = sort(distance);
15: X = X( idx(1:np), : ); % resize X to np points
16:
17: % re-establish the LH conditions
18: Xmin = min(X);
19: for c1 = 1 : nv
20: % place X in the origin
21: X = sortrows(X, c1);
22: X(:,c1) = X(:,c1) - Xmin(c1) + 1;
23: % eliminate empty coordinates
24: flag = 0;
25: while flag == 0;
26: mask = (X(:,c1) ~= ([1:np]'));
27: flag = isequal(mask,zeros(np,1));
28: X(:,c1) = X(:,c1) - (X(:,c1) ~= ([1:np]'));
29: end
30: end
31: return
25
B. Number of points allocated before resizing the TPLHD
As discussed in Section II, the number of points that needs to be allocated for a given configuration of the
TPLHD is dictated by Eqs. (3) and (4). In the most general cases, a resizing process needs to be used. Thus, the total
number of points initially generated dictates the computational cost of the TPLHD algorithm. Table 10 shows the
number of points allocated before the resizing process for each of the configurations studied in this paper. It can be
observed that the more variables, the more points are discarded. This confirms the fact that design strategies based
on the domain subdivision (the translational propagation algorithm is an example) tend to have poor performance as
dimensionality increases.
Table 10: Size of the TPLHD created before the resizing process for each seed size. The final best TPLHD is
References
[1] Venkataraman S, and Haftka RT, Structural optimization complexity: what has Moore’s law done for us?, Structural and
Multidisciplinary Optimization, Vol. 28, No. 6, pp. 375-387, 2004.
[2] Sacks J, Welch WJ, Mitchell TJ, and Wynn HP, Design and Analysis of Computer Experiments, Statistical Science, Vol. 4
(4), pp. 409-435, 1989.
[3] Simpson TW, Peplinski JD, Koch PN, and Allen JK, Meta-models for computer based engineering design: Survey and
recommendations. Engineering with Computers, Vol. 17, No. 2, pp. 129-150, 2001.
[4] Queipo NV, Haftka RT, Shyy W, Goel T, Vaidyanathan R, and Tucker, PK, Surrogate-based analysis and optimization,
Progress in Aerospace Sciences, Vol. 41, pp. 1-28, 2005.
[5] Simpson TW, Toropov V, Balabanov V, and Viana FAC, Design and Analysis of Computer Experiments in
Multidisciplinary Design Optimization: a Review of How Far We Have Come – or Not, in: Proceedings of the 12th
AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, Victoria, Canada, September 10-12, 2008. AIAA
2008-5802.
[6] Kleijnen JPC, Design and analysis of simulation experiments, Springer Verlag, 2007.
[7] Forrester AIJ and Keane AJ, Recent advances in surrogate-based optimization, Progress in Aerospace Sciences, Vol. 45,
No. 1-3, pp. 50-79, 2009.
[8] Montgomery DC, Design and analysis of experiments, John Wiley & Sons, 2004.
[9] Simpson TW, Lin DKJ, and Chen W, Sampling Strategies for Computer Experiments: Design and Analysis, International
Journal of Reliability and Applications, Vol. 2, No. 3, pp.209-240, 2001.
[10] Viana FAC, Haftka RT, and Steffen V., Multiple surrogates: how cross-validation errors can help us to obtain the best
predictor, Structural and Multidisciplinary Optimization, available online (DOI: 10.1007/s00158-008-0338-0).
26
[11] Viana FAC and Haftka RT, Using Multiple Surrogates for Metamodeling, 7th ASMO-UK/ISSMO International Conference
on Engineering Design Optimization, Bath, UK, July 7-8, 2008.
[12] Samad A, Kim K, Goel T, Haftka RT, and Shyy W, Multiple surrogate modeling for axial compressor blade shape
optimization, Journal of Propulsion and Power, Vol. 24, N. 2, pp. 302-310, 2008.
[13] Giunta AA, Wojtkiewicz SF, Eldred MS, Overview of modern design of experiments methods for computational
simulations. 41st AIAA Aerospace Sciences Meeting and Exhibit, Reno, NV, AIAA-2003-0649, 6–9 January 2003.
[14] Goel T, Haftka RT, Shyy W, and Watson LT, Pitfalls of using a single criterion for selecting experimental designs,
International Journal for Numerical Methods in Engineering, Vol. 75, No. 2, pp. 127–155, 2008.
[15] Myers RH and Montgomery DC, Response Surface Methodology. Process and Product Optimization using Designed
Experiments, John Wiley and Sons, 1995.
[16] McKay MD, Beckman RJ, and Conover WJ, A Comparison of Three Methods for Selecting Values of Input Variables from
a Computer Code, Technometrics, Vol. 21, pp. 239-245, 1979.
[17] Iman RL, and Conover WJ, Small sample sensitivity analysis techniques for computer models, with an application to risk
assessment, Communications in Statistics, Part A. Theory and Methods, Vol. 17, pp. 1749-1842, 1980.
[18] Kleijnen JPC, Sanchez SM, Lucas TW, and Cioppa TM, A user’s guide to the brave new world of designing simulation
experiments, INFORMS Journal on Computing, Vol. 17, No. 3, pp. 263-289, 2005.
[19] Audze P and Eglājs V, New Approach for Planning out of Experiments, Problems of Dynamics and Strengths, Vol. 35,
1977, pp. 104-107, Riga, Zinatne Publishing House (in Russian).
[20] Park JS, Optimal Latin-hypercube designs for computer experiments. Journal of Statistical Planning and Inference, Vol.
39, pp. 95–111, 1994.
[21] Morris MD, and Mitchell TJ, Exploratory designs for computational experiments, Journal of Statistical Planning and
Inference, Vol. 43, pp. 381-402, 1995.
[22] Ye KQ, Li W, and Sudjianto A, Algorithmic construction of optimal symmetric latin hypercube designs. Journal of
Statistical Planning and Inference, Vol. 90, pp. 145-159, 2000.
[23] Fang KT, Ma CX, and Winker P, Centered L2-discrepancy of random sampling and Latin hypercube design and
construction of uniform designs, Mathematics of Computation, Vol. 71, pp. 275-296, 2002.
[24] Bates SJ, Sienz J and Toropov VV, Formulation of the optimal Latin hypercube design of experiments using a permutation
genetic algorithm, 45th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference, AIAA-
2004-2011, Palm Springs, California, Apr. 19-22, 2004.
[25] Jin R, Chen W, and Sudjianto A, An Efficient Algorithm for Constructing Optimal Design of Computer Experiments,
Journal of Statistical Planning and Inference, Vol. 134, pp. 268-287, 2005.
[26] Liefvendahl M and Stocki R, A study on algorithms for optimization of Latin hypercubes, Journal of Statistical Planning
and Inference, Vol. 136, No. 9, pp. 3231-3247, 2006.
[27] van Dam E, Husslage B, den Hertog D, Melissen H, Maximin Latin hypercube designs in two dimensions, Operations
Research, Vol. 55, No. 1, pp.158-169, 2007.
[28] Grosso A, Jamali A, and Locatelli M, Finding maximin Latin hypercube designs by Iterated Local Search heuristics,
European Journal of Operational Research, Vol. 197, No.2, pp. 541-547, 2009.
[29] Viana FAC, SURROGATES ToolBox, http://fchegury.googlepages.com, 2009.
[30] Mathworks contributors, 2008, MATLAB® The language of technical computing, Version 7.6.0.324 R2008a, The
MathWorks Inc.
[31] Cioppa TM and Lucas TW, Efficient nearly orthogonal and space-filling Latin hypercubes, Technometrics, Vol 49, No. 1,
pp. 45-55, 2007.
[32] Hernandez AS, Breaking Barriers to Design Dimensions in Nearly Orthogonal Latin Hypercubes, PhD Thesis, Naval
Postgraduate School, Monterey, CA, USA, 2008.
[33] Franco J, Carraro L, Roustant O, and Jourdan A, A radar-shaped statistic for testing and visualizing uniformity properties in
computer experiments, Proceedings of the Joint ENBIS-DEINDE 2007 Conference: Computer Experiments versus Physical
Experiments, Turin, Italy, April 11-13, 2007.
27