Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 38

J Glob Optim (2007) 37:405436 DOI 10.

1007/s10898-006-9056-6 ORIGINAL PAPER

On initial populations of a genetic algorithm for continuous optimization problems


HeikkiMaaranen KaisaMiettinen Antti Penttinen

Received: 7 June 2006 / Accepted: 10 June 2006 / Published online: 20 July 2006 Springer Science+Business Media B.V. 2006

Abstract Genetic algorithms are commonly used metaheuristics for global optimization, but there has been very little research done on the generation of their initial population. In this paper, we look for an answer to the question whether the initial population plays a role in the performance of genetic algorithms and if so, how it should be generated. We show with a simple example that initial populations may have an effect on the best objective function value found for several generations. Traditionally, initial populations are generated using pseudo random numbers, but there are many alternative ways. We study the properties of different point generators using four main criteria: the uniform coverage and the genetic diversity of the points as well as the speed and the usability of the generator. We use the point generators to generate initial populations for a genetic algorithm and study what effects the uniform coverage and the genetic diversity have on the convergence and on the nal objective function values. For our tests, we have selected one pseudo and one quasi random sequence generator and two spatial point processes: simple sequential inhibition process and nonaligned systematic sampling. In numerical experiments, we solve a set of 52 continuous test functions from 16 different function families, and analyze and discuss the results. Keywords Global optimization Continuous variables Evolutionary algorithms Initial population Random number generation

H. Maaranen Patria Aviation Oy, Lentokonetehtaantie 3, FI-35600 Halli, Finland K. Miettinen (B) Helsinki School of Economics, P.O. Box 1210, FI-00101 Helsinki, Finland e-mail: kaisa.miettinen@hse. A. Penttinen Department of Mathematics and Statistics, P.O. Box 35 (MaD), FI-40014 University of Jyvskyl, Finland

406

J Glob Optim (2007) 37:405436

1 Introduction When solving real optimization problems numerically, the solution process typically involves the phases of modeling, simulation and optimization. A simulated model of a real life problem is often complex, and the objective function to be minimized may be nonconvex and have several local minima. Then, global optimization methods are needed to prevent the stagnation to a local minimum. Therefore, in the recent years, there has been a great deal of interest in developing methods for solving global optimization problems (see, e.g., [12, 18, 32] and references therein). Here, we consider global continuous optimization problems. Genetic algorithms [15, 16, 25] are metaheuristics used for solving problems with both discrete and continuous variables. The population is the main element of genetic algorithms, and the genetic operations like crossover and mutation are just instruments for manipulating the population so that it evolves towards the nal population including a close to optimal solution. The requirements set on the population also change during the execution of the algorithm. In the recent years, genetic operators have been developed intensively (see, for example, [25, 26] and references therein). In addition, most of the theoretical studies involve the tuning or controlling of the parameters of genetic operators and their role is often considered signicantperformance-wise (see, forexample, [11]).However, the role of the initial population, which is the topic of this paper, is widely ignored. Often, the whole area of research is set aside by a statement generate an initial population, without implying how it should be done. We show with a simple example that initial populations may have effects, on the best objective function value found, and these effects may last for several generations. Then, we continue to study whether the traditional way of generating initial populations is recommendable or whether there are other point sets that give faster convergence. Our motivation is to encourage discussion onwhether one should paymore attention to the generation of initial populations. We concentrate on the casewhere there is no a priori information about the location of the global minima. Then, initial populations of genetic algorithms are traditionally generatedrandomly. In practice, genuinerandom (truly independent) points cannot be generated numerically, and instead, pseudo random points (see, for example, [14]) are used, which imitate genuine random points. However, beside [23, 24], there is, up to our knowledge, practically no research done on whether the initial population should be random. In [23, 24], quasi random sequences are used to generate initial populations for genetic algorithms. Quasi random points do not imitate random points but are designed to maximally avoid each other [33]. Quasi random sequences are used in numerical integration [5, 35, 38], computer simulations [1, 22] and quasi random searches [21, 29, 37] with good success. For example, in [2, 3], modied controlled random searches and topographical multilevel single linkage are proposed, respectively, which use quasi random sequences when generating an initial set of solutions. The comparison shows that the proposed methods perform signicantly better than the other algorithms in the comparison [2, 3]. However, in [2, 3], the inuence of the use of quasi random sequences alone cannot be estimated since also other modications are made simultaneously to the algorithms. In this paper, we single out the effect of the different initial population for genetic algorithms by keeping the rest ofalgorithm identical.We study the inuence of the initial population more generally than in [23, 24] and discuss the properties of different

J Glob Optim (2007) 37:405436

407

types of point generators and the effects that different initial populations have on the convergence and the nal objective function values. We also collect information and references about different ways of generating initial populations for those who are interested in alternativeways. For the convenience of the reader,we briey summarize ways of generating points not widely used in the eld ofoptimization. In thenumerical tests, we use a well-established pseudo random number generator, a so-called Niederreiter quasi random sequence generator, which has performed well in our earlier tests [23], simple sequential inhibition (SSI) process [9] and nonaligned systematic sampling, which originates from sampling design [34]. Collectively, we call different ways to generate initial populations point generators. The SSI process and the nonaligned systematic sampling are commonly used in statistics where point generators are called spatial point processes [9]. Spatial point processes are used, for example, in the statistical analysis of biological phenomena when simulating a distribution of a population of plants or animals (see, e.g., [9, 19] and references therein). Innumerical tests, we x the geneticalgorithm and its parameter values and change only initial populations.We use a test suite of 52 test functions from 16 different function families and test whether the differences in best nal objective function values found are statistically signicant between the different variants of genetic algorithms. We also study the convergence during the rst generations by stopping the algorithm prematurely after 10 and 20 generations. The rest of the paper is organized as follows. In Section 2, we show that the initial population may have an effect on the convergence of a genetic algorithm and give some further motivation for this work. We also shortly present the genetic algorithm used. In Section 3, we give an overview to different point generators that can be used when generating initial populations, and in Section 4, we discuss and evaluate their speed and usability as well as the coverage and the genetic diversity of the points generated. The numerical results of applying the different variants of genetic algorithms to our test suite are presented and analyzed in Section 5. In Section 6, we discuss the results, and nally, in Section 7, we conclude the paper and present some directions for future research. 2 Preliminaries Population-based genetic algorithms (see, e.g., [15] and references therein) are designed for solving problems that may have several local minima. They are very general problem solvers, which means that they can be used for solving a wide range of problems. On the other hand, they do not exploit problem-specic information, which makes them less efcient. Hence, genetic algorithms ought to be used when problem-specic methods are not available or if a wide range of problems need to be solved with a single algorithm. We consider global optimization problems of the following form minimize f (x) subject to x il x i xu i , where f : R
i

i = 1, ... , n,
l u

Rn

n R is the objective function, x are the decision variables and x , x are the vectors containing the lower and upper bounds for the decision variables, respectively.

408

J Glob Optim (2007) 37:405436

A simple genetic algorithm includes three basic genetic operations: selection, crossover and mutation. In selection, some solutions from the population are selected as parents, in crossover the parents are crossbred to produce offspring and in mutation the offspring may be altered according to mutation rules. In genetic algorithms, solutions x are called individuals and iterations of an algorithm are called generations. Many genetic algorithms also employ elitism, which means that a number of the best individuals are copied to the next population. We use a real-coded genetic algorithm that employs tournament selection, heuristic crossover, Michalewiczs nonuniform mutation and elitism. For further details of the genetic operations, see [25]. The algorithm used has the following parameters . Population size is the number of individuals in a population and elitism size is the number of ttest individuals that are copied directly to the next generation. The tness is evaluated using the objective function (tness function) value f (x). Tournament size is the number of individuals randomly picked from the whole population for the tournament selection. Crossover rate and mutation rate are probabilities on which the parents are crossbred and offspring mutated, respectively. Max generations , steps ,and tolerance are parameters for the stopping criteria. The algorithm is stopped if a maximum number of generations (max generations) is reached or if there is no change (within the tolerance) in the best objective function value during the last steps generations. Genetic algorithms are Markov chains (see, e.g., [31, 46] and references therein). Such chains are expected to converge to an equilibrium distribution independent of the initial state. However, in many applications the state space of the chain (possible point congurations in the feasible region) is extremely large and convergence can be very slow. This poses the question whether the number of the generations used is sufcient in order to achieve the equilibrium. The initial conguration is one factor in the speed of convergence. This is our motivation for studying empirically the role of initial populations. The difference in the early generationsbecomes important in solvingmany real-life problems, where the evaluation of the objective functionmay requiretime-consuming simulations and the optimization algorithm may have to be stopped prematurely. We also expect that the inuence of the initial population may carry further, in terms of generations, when solving of the problem requires a large number of function evaluations. Figures 1 and 2 illustrate the convergence of a genetic algorithm for the 10-dimensional Griewangk function and 10-dimensional Katsuura function:
10 10

Griewangk function : f (x) =


i=1 10

(x 2 i / 40 00 )
i =1 30

(cos(x i ) / i)
i k

100,

1 +i Katsuura function: f (x) =


i=1 k =1

| 2k

10 x
i

2k x 2 x |

< 1.

0.1 < x

Each curve illustrates the average convergence for 100 separate runs of a genetic algorithm. The three different curves in Figs. 1 and 2 correspond to runs with different initial populations generated by using a pseudo random number generator. Here pseudo stands for the case, where the initial pseudo random population is spread out over the whole feasible region. Furthermore, clustered 1 and clustered 2 stand for cases where the initial population is restricted to a subspace of the feasible region. The

J Glob Optim (2007) 37:405436

409

60 Best objective function value found 50 40 30 20 10 0 5 10 Generations

clustered 1 pseudo clustered 2

15

20

Fig. 1 The convergence of a genetic algorithm for 10D Griewangk function with different initial populations

2000 Best objective function value found

clustered 1 pseudo clustered 2

1500

1000

500

0 40

45

50 Generations

55

60

Fig. 2 The convergence of a genetic algorithm for 10D Katsuura function with different initial populations

subspace is dened by restricting each variable x i to the upper and lower80% of their total range for clustered 1 and clustered 2, respectively. For theGriewangk function the curves merge after 30 generations. The Katsuura function requires more generations and, therefore, also the curves merge later than for the Griewangk function. (Note the different scales in Figs. 1 and 2.) For both Griewangk and Katsuura functions, the differences in the best objective function values found is quite signicant when the number of generations is small. This indicates that the initial population has an effect on the convergence of a genetic algorithm. The clustered populations used in the simple example above were only theoretical. In practice, such initial populations would hardly be used, unless there were some

410

J Glob Optim (2007) 37:405436

a priori knowledge about the location of the global minima. The rest of the paper concentrates on more realistic alternative ways for generating initial populations. When there is no a priori information available on the location and the number of local optima, then the initial population of a genetic algorithm should be able to reach as large part of the feasible region as possible by means of crossover. We call this property genetic diversity , and it is related to the independence of points.Another desirable property for an initial population is a good uniform coverage. By a good uniform coverage we mean that the points are well spread out to cover the whole feasible region. Points have a gooduniform coverage if they do notform clusters or leave relatively large areas of the feasible region unexplored. A good uniform coverage is desired, because then information is obtained throughout the whole feasible region. This helps to prevent premature convergence. Genetic diversity and a gooduniform coverage are in practice conicting objectives, because an optimal uniform coverage is often achieved using systematic structures, which limit the set of possible offspring. For example, it is known that for a closely related problem of sphere packing a triangular grid provides the optimal packing in a 2-dimensional case (see, e.g., [44]). Hence, if the pattern was repeated, the distance between points would reach its maximum, when using triangular grid. However, the set of possible offspring is limited as illustrated in Fig. 3. This happens also if the sample points form clusters. Therefore, seemingly random (independent) point sets with no clustering are preferred, since they provide information over the whole feasible region and a large part of the feasible region can be reached by the means of crossover. Figure 3 illustrates a rectangular grid, triangular grid, clustered points, and seemingly random points with no clustering. In the gure, the points marked with

Fig. 3

Coverage of the connecting lines with different patterns

J Glob Optim (2007) 37:405436

411

small black dots are parent solutions and the lines running through them illustrate all the possible locations of offsprings. 3 Generating initial population Many sub-areas of geneticalgorithms have been studied elaborately, but the selection of an initial population has been widely ignored.Asmentioned earlier, the traditional way to generate the initial population is to use pseudo random numbers, and more recently also quasi random sequences have been applied in [23, 24]. Pseudo random numbers imitate genuine random numbers and quasi random sequences are designed to produce points that maximally avoid each other. Pseudo random initial populations can be generated in numerous ways. The main classes are congruential and recursive generators. Common congruential generators include linear, quadratic, inversive, additive and parallel linear congruential generators [14, 45]. Recursive generators include multiplicative recursive, lagged Fibonacci, multiply-with-carry-generator, add-with-carry and substract-with-borrow generators [14]. There are also pseudo random vector generators, which produce sequences of vectors instead of scalars.Examples of those are feedback shift register generator [14] and SQRT generator [43]. Common quasi random sequence generators include Van der Corput, Hammersley, Halton, Faure, Sobol and Niederreiter generators [4, 14, 30]. For the convenience of the reader some examples of both pseudo random number generators and quasi random sequence generators are included in the Appendix along with further references. Pseudo random numbers and quasi random sequences are quite well-known for researchers in optimization, but there are also other types of point generators. Spatial point processes are commonly used in statistics but they are less well-known in optimization. Therefore, we now shortly describe the spatial point processes used in this paper. 3.1 Spatial point processes Spatial point processes [9] can be considered to betransformations of pseudorandom point initialization which lead to a good uniform coverage and simultaneously avoid periodicity in the conguration generated. Some spatial point processes include a parameter, by which the proportion of the two conicting properties, genetic diversity and a good uniform coverage, can be controlled. Hence, the same process can be used to generate genetically diverse points or points with a good uniform coverage or something in between. There are several types of spatial point processes. For our purposes, it is adequate to differentiate between clustering and inhibition processes. Clustering processes generate points that simulate populations, where the individuals tend to form clusters. However, we are more interested in the situation where each individual is as far apart as possible from the adjacent individuals resulting in an evenly distributed population without clusters. Therefore, we concentrate on inhibition processes, where an individual (a point) is either explicitly prohibited to be located closer than some predened minimum distance > 0 to the other individuals, or individuals are kept apart by some implicit means. We describe here two inhibition

412

J Glob Optim (2007) 37:405436

processes: a simple sequential inhibition process and a nonaligned systematic sampling. Simple sequential inhibition process: In the simple sequential inhibition (SSI) process [9], a new individual is accepted to enter the population only if its distance to all the existing individuals in the population is at least . In the following pseudo code for the SSI process the parameter population size is the number of points to be generated, futile is the number of rejected trial points during the current iteration, Max_futile is the maximum number of rejected trial points in one iteration before terminating the process, and n is the dimension of the space where the points are generated. 0. Initialize parameters population size ,Max_futile and dimension n and set futile = 0 and k = 0. 1. Do until k = = population size or futile==Max_futile 1.1 Generate a trial point. (Use a pseudo or a quasi random number generator to generate a trial point to the n-dimensional unit hyper cube.) 1.2 Check whether the trial point is accepted. If the distance to the existing individuals is larger than , accept the point into the population, and set k = k + 1 a nd futile = 0. Else, set futile =futile +1. End do The distance between points can becomputed using differentmetrics. If theminimum distance is dened to be too large, then themaximumnumber of rejected trial points in one iteration is reached and the process is terminated prematurely. In that case, we supplement the sample with pseudo random points to match the population size. Nonaligned systematic sampling: In the nonaligned systematic sampling, which originates from sampling design [34], the unit hyper cube is divided into b n elementary intervals (see Appendix) with equal side lengths (i.e., an equally spaced grid). Then one sample point is selected from each elementary interval according to prescribed rules. Nonaligned systematic sampling uses b n pseudo random numbers to dene the location of the sample points. In two dimensions, the sample points are generated using the formula x = ([(j 1)+r
i,1

] , [(i 1)+r

j,2 ] ) ,

where i, j = 1, ... , b, = 1 / b , a nd r is a b n array of pseudo random numbers, see Fig. 4. In the following pseudo code for an n-dimensional case of the nonaligned systematic sampling, v is a vector of auxiliary integer variables. It is used to determine the correct elementary interval and the correct pseudo random number when calculating the nonaligned systematic sample points x. 0. Initialize v j = 0for j = 0, ... , n 1, and the pseudo random number array . 1. Do i = 0, ... , bn 1. 1.1 Compute sample point x i Do j = 0, ... , n 1 // Compute the jth component of x i n S = k =1 v k v j li = (S)mod b x j = ( r l, j + v j ) End do

413

Fig. 4 Nonaligned systematic sampling points in two dimensions

1.2 Update v (i),mod b j = = bj 1) then v j Do j = If 0, ( ... n 1 End do j = (v + 1 ) mod b End do Note that freely, but is tain number sample with the number of nonaligned systematic sampling points cannot be chosen determined by the grid size and the dimension. In practice, when a cerof sample points is required, we supplement the nonaligned systematic pseudo random points to match the required number of points.

4 Properties of point generators In this section, we consider the distribution and the genetic diversity of the points generated by pseudo random number generators, quasi random sequence generators and spatial point processes. We also consider the speed and the usability of some specic generators. We are interested in point sets with a relatively small number of points since the number of points generated equals the population size and ranges typically from tens to a few hundreds. In the illustrations and numerical examples of point generation, we have selected good representatives for the different types of point generators. The pseudo random number generator is a multiplicative linear congruential generator from the wellestablished numerical library of the Numerical Algorithms Group Ltd (NAG) and the quasi random sequence generator is the Niederreiter generator [30] that has proved successful in our earlier tests [23, 24]. The representatives of spatial point processes are the SSI process and the nonaligned systematic sampling. In the SSI process, the trial points are generated using the Niederreiter generator and the distances are measured using L 2 -metric. The nonaligned systematic sampling is as dened in Section 3. The properties of the point generators are described in the following four subsections and the results are summarized and some conclusions are drawn in the fth subsection.

414

J Glob Optim (2007) 37:405436

4.1 Distribution Intuitively thinking, it is benecial if no large areas are left unexplored when sampling individuals for an initial population of a genetic algorithm. The point generators presented in Section 3 produce point clouds with a degree ofuniform coverage.However, there are differences in how well the points of these sequences are spread out. In optimization and in numerical integration, the goodness of the uniform coverage of a point set is commonly measured by discrepancy or dispersion [30]. Here, we give the denition for the discrepancy. Discrepancy Let I n Rn 1 be an n-dimensional unit hyper cube, let P be a set consistN In ing of points x , n..., x ,andlet B be a nonempty family of Lebesgue-measurable k ,1 as the number points xB N subintervals of I of and B . Furthermore, let A (B ; P )be a counting function dened A(B;P) n k N N, for whichB xk B . Then, discrepancy D N with respect to P in I is dened as D ( B ; P ) = sup Lebesgue-measure. Discrepancy is large, when there exists clusters or large unexplored areas. Hence, we are interested in point sets that have low values for discrepancy. A mathematical relationship between discrepancy and dispersion is given in [30], and it shows that every low-discrepancy sequence is also a low-dispersion sequence, but not n vice 1 versa. Quasi random sequences are also called low-discrepancy sequences. Their discrepancy value for a large sample size N is of order of magnitude C ( log N ) N , w here C is a generator-specic coefcient depending only on the dimension n [27]. This is also a minimum possible discrepancy size for large N (see, e.g., [27]). The bounds for discrepancy, however, are relevant only for a very large number of sample points as used in numerical integration. When generating only initial populations, we are m more interested in the distribution of a small number of points. Some quasi random sequences have the property that the rst b successive points divide evenly on the correspondingelementary intervals (seeAppendix). This indicates that quasirandom sequencesmay have good discrepancy values also forsmall sample sizes. Figure 5 illustrates four two-dimensional initial populations with 1,024 individuals generated using the multiplicative linear congruential (pseudo) generator, the Niederreiter generator, the SSI process and the nonaligned systematic sampling, respectively. In Fig. 5, we see that pseudo random points form clusters and leave some areas relatively unexplored, whereas the other samples cover the feasible region quite well. However, quasi random sequences may sometimes form point patterns, whose distribution properties depend on the number of points generated. Then, at a certain number of sample points the pattern may be unfavorable for optimization purposes. Anexample of this behavior is illustrated in Fig. 6, where again 1,024 points were generatedwith a slightly different conguration. Formore information on the distribution of different quasi random sequences, we refer to [27]. Next, we consider the distribution of small sample sizes. In optimization with geneticalgorithms we used a population of 201 individuals. Therefore, we nextempirically examine the distribution properties of sets with 201 points generated in four different ways. Henceforth, in tables and gures, we will use abbreviations Pseudo, Nieder, SSI and Nonalig for the representatives selected from different types of point generators (see the beginning of this section).
B

( B ) , w here is a

J Glob Optim (2007) 37:405436

415

Pseudo generator

Niederreiter generator

0.5

0.5

0.5 SSI process

0.5

0.5

0.5

0.5

0.5 Nonaligned systematic sampling

Fig. 5

Point sets of 1,024 points with different generators

We consider 2 and 10-dimensional cases to show how the distribution properties may change with the dimension. First, we use an empirical empty space statistic to examine how well the sample points cover the feasible region [34]. Empty space

statistic function ess is dened as ess(r ) = 1 Pr ( B ( x , r ) is empty), where x is a randomly chosen point, B(x, r) is an x-centered ball with radius r and Pr denotes the probability. Hence, assume there is a point set S and further assume there is a ball B with radius r and whose center point is randomly chosen from the feasible region (in our case, the unit hyper cube). Then, for each radius r, the empty space statistic function tells what the probability is that the ball B is empty, that is, the intersection of the ball B and the point set S is empty.

J Glob Optim (2007) 37:405436

Fig. 6 Another example of 1,024 points generated with the Niederreiter generator

Niederreiter generator

0.5

To experimentally dene the empty space statistic functions set con0 0.5 for each point 1 taining 201 points, we generated 10,000 auxiliary pseudo random points on a unit hyper cube and for each of the 10,000 auxiliary points calculated the maximal radius r so that B(x, r) is empty, that is, does not contain any of the 201 points under consid1 eration. We did this in both 2- and 10-dimensional cases for the four different point The empirical empty space statistic functions are illustrated in Fig. 7. For us, the important property of the empty space statistic function is the steepness of the curve sets. (the steeper the better). If the curve is steep, it indicates that if a random ball with radius r is selected from the unit hyper cube, then whether the ball is empty depends mainly on the radiusnot the location of the ball. If, on the other hand, the curve is gentle, the emptiness of the ball depends strongly on the location, hence the point set is not evenly distributed. The empiricalempty space statistic functions show that in two dimensions, the nonaligned systematic sampling and the simple sequential inhibition (SSI) process provide the best coverages closely followed by the Niederreiter quasi random sequence generator. The pseudo random initial population has clearly the worst coverage, which conrms the ndings earlier shown in Fig. 5. However, in ten dimensions the differences diminish, because the populations become sparse. The only notable difference is that for the population generated with the SSI process the curve of the empirical empty space statistic function is now below the other curves with the small values of r but rises more steeply in the end. This indicates better coverage for the SSI process. 4.2 Genetic diversity The population of a genetic algorithm evolves largely by crossovers and mutations. In our implementation, the most dominant genetic operator is crossover, since it usually changes the solutions most. We dene genetic diversity as a property by which the
1 Here, we naturally used a different pseudo random number generator than when generating the

initial population. To make sure that the use of pseudo random numbers did not bias the results we also computed the empty space statistic functions using a grid of 10,000 points, which led to similar results.

J Glob Optim (2007) 37:405436

417

2D 1 0.8 0.6 0.4 0.2 0 0 Pseudo Nieder SSI Nonalig 0.05 0.1 Radius r 0.15 0.2 ess(r) 1 0.8 0.6 0.4 0.2 0

10D

ess(r)

Pseudo Nieder SSI Nonalig 0 0.2 0.4 0.6 0.8 1 1.2 1.4

Radius r
Fig. 7 Empirical empty space statistic functions

genetic algorithm is able to reach as large a part of the feasible region as possible by means of crossover. Genetically more diverse initial populations are preferable. Next,we concentrate on the independence of points,which is onemain criterion for genetic diversity.Asmentioned earlier, truly independent points cannot be generated algorithmically. The pseudorandomnumbers try to imitate independentnumbers, yet they are deterministically generated by an algorithm. For example, it has been shown that the pseudo random points generated by a linear congruential generator lie on a simple lattice (see, e.g., [14] and references therein). It is also known that bad choices for the coefcients of a linear congruential generator may result in a very bad lattice structure (see, e.g., [14]). The choices for the coefcients play a crucial role for all the 1 pseudo random number generators. n n, Also quasi random sequences are generated n algorithmically. However, the idea of n quasi random points is not to imitate random points but to try to cover the whole search space yet maintaining a certain degree of randomness. These initial settings speak for a better genetic diversity for pseudo random initial populations. In empirical tests, when comparing pseudo and quasi initial populations, we can study plots of 2-dimensional populations or plots of n-dimensional populations that are projected to two dimensions. In Figs. 5 and 6 we can notice that the Niederreiter generator

produces points that have a more distinguishable pattern whereas pseudo random points have a pattern that seems more random. We studied the independence of point sets of 201 2-dimensional points using Ripleys K-function and L-function [34]. The K-function is dened as follows: K ( r ) is the expected number of further points within a ball with radius r and with a center at a randomly chosen point ( is the expected number of points in a unit hyper cube). L-function is obtained from the K-function through transformation L ( r ) = ( K ( r )/ )

where is the volume of the unit ball in R . Figure 8 illustrates the L-functions for the pseudo random points, the Niederreiter quasi random points, points from the SSI process and the nonaligned systematic sampling points. For independent points

418 Fig. 8 L-functions for different initial populations

J Glob Optim (2007) 37:405436

Four Lfunctions
0.25

0.2

L function

0.15

0.1 Pseudo Nieder SSI Nonalig 0 0.05 0.1 0.15 Distance r 0.2 0.25

0.05

the L-function is a straight line L (r)=r . We can see that pseudo random points imitate well independent points, whereas Niederreiter quasi random points and the two spatial point processes lack the points falling close to each other. Based on these observations, we conclude that the pseudo random generator produces genetically more diverse points than the other point generators. The points generated by the SSI process are essentially pseudo random points. However, the trial points falling too close to each other are not accepted to the population and this affects the genetic diversity. The genetic diversity versus a gooduniform coverage can be controlled using theminimum distanceparameter . In what follows, we use the term probabilistic maximal minimum distance (PMMD) and we should nd the value of PMMD so that a desired number of points can t in the feasible region with a given probability. In the 2-dimensional example in Fig. 8, we can see that, except for small distances, the points of the SSI process fall on a straight line. The nonaligned systematic sampling uses only n b pseudo random points, where n is the dimension and 1/b is the side length of the elementary interval (the distance between two adjacent grid points). Therefore, there is not much genetic diversity in nonaligned systematic sample points. In Fig. 8, we see this as an oscillating behavior of the L-function. Furthermore, the number of nonaligned systematic sample points increases quickly with the dimension if the grid size is kept constant. For example, in 10 or 20-dimensional cases, even if we use only two grid points along each dimension, we get 2 10 = 1, 024 and 220 = 1, 048, 576 sample points, respectively. When using a population size of 201 individuals, we get in practice only one genuinesystematicsample point in 8 or more dimensions (the rest are supplemented pseudo random points). This means that, in large dimensions, the nonaligned systematic sampling reduces to pseudo random sampling. 4.3 Speed For some applications the speed of the generator may be of importance. The CPU times for the tested generators and the different numbers of points generated in dimensions 2, 5, 10, 20, 50 are shown in Table 1. The test runs were performed on

J Glob Optim (2007) 37:405436 Tab le 1 CPU times in seconds Dim 2 2 5 2 5 10 5 10 10 20 20 20 50 50 50 Points 201
4 5

419 SSI Pseudo 0.00 0 0.01 0.00 0 0.06 0 0.02 0.00 0 0.14 0.03 0.28 0.00 0.06 0.55 0.01 0.14 1.38 Nieder 0.00 0.00 0.02 0.00 0.00 0.03 0.00 0.01 0.05 0.01 0.01 0.09 0.00 0.02 0.2 0.06 17.60 1774.0 0.2 17.84 1760.0 0.23 17.71 1773.0 0.73 10.78 21.33 1.54 44.50 45.68 Nonalig 0.00 0.02 0.23 0.00 0.06 0.67 0.00 0.04 0.98 0.00 0.05 0.46 0.01 0.12 1.2

1 201 1 4 1 201 1 4 10 10 5 201 10 4 10 5 201 10 4 10 5


5

an HP9000/J5600 computer, and the CPU times were obtained using NAG routine X05BAF, which returns the CPU time with the accuracy of 0.01 s. For this reason, differences betweensome generators become evident only when using a largenumber of points. The column Dim in Table 1 indicates thedimensions of the points generated and the column Points lists the number of points generated. The abbreviations for the generators are the same as earlier, and the CPU times are given in seconds. In Table 1, we see that the SSI process is by far the slowest resulting from the implementation which emphasizes the good coverage at the expense of speed. Our implementation allows a maximum of 2,000 futile trial points at each iteration before terminating the process. The speed (and the distribution of points) for the SSI process depends strongly on the probabilistic maximal minimum distance PMMD. If PMMD is selectedmaximally large (i.e., PMMD is as large as possible yet so that the algorithm does not have to supplement the population with pseudo random points, see Section 3.1), then a good coverage of the points is emphasized and the number of futile tries increases slowing down the process. Our implementation of the SSI process automatically estimates the maximally large PMMD. The values in Table 1 indicate that and 50. the estimate for PMMD is probably not good (distribution-wise) for dimensions 20 Table 1 shows that the implementation of the Niederreiter quasi random number generator is the fastest followed closely by the pseudo random number generator and the nonaligned systematic sampling process. In optimization, where only a relatively small number of points is generated, there is no practical difference between the speed of these three generators. For pseudo and quasi random number generators the speed seems to be directly proportional to the number of points generated. We can see that, compared to other generators, the speed of the nonaligned systematic sampling grows for largerdimensions. This can be explained by different ratios of genuine nonaligned sampling points. As mentioned earlier, the population of nonaligned systematic sampling points is supplemented by pseudo random points to match the required population size. Hence, for example, in 5 dimensions there are 78% and 100% of genuine nonaligned systematic sampling points in sets of 10,000 and 100,000

420

J Glob Optim (2007) 37:405436

sample points, respectively, whereas in 10 dimensions the respective values are only 10% and59%. In 20 dimensions, all except onesample point are already random even if the sample size was as large as one million (2 20 >10 6 ). 4.4 Usability In this section, we discuss the usability of different number generators. Some features affecting usability have already been mentioned earlier and are summarized here. Pseudo random number generators are by far the most commonly used ones in optimization. The greatest advantage of pseudo random number generators concerning usability is that there exist implementations that are well-established, well-tested and easily available.However, a goodrandom-like behavior is not amatter-of-course, but requires a careful choice of parameters, that is, coefcients and seeds (or initial sequences). As earlier noted for linear congruential generators, wrong choices for parameter values may cause the points in a sequence to be badly distributed. This is true for other generators as well. Many of the pseudo random number generators are easy to implement, but the analysis of the distribution is difcult, especially for an arbitrary seed (or arbitrary initial sequences). According to [14], only well-tested pseudo random generators should be used. These generators often use xed coefcients and sometimeslimit the choice of seeds (or initial sequences). This is one way to secure good distribution properties, assuming those values are tested and approved. Contrary to pseudo random number generators, quasi random number generators are neither so well-established, well-tested nor easily available. On the other hand, quasi random number generators having solid theoretical properties do not need so much numerical testing. The use of available quasi generators is easy and the quasi random points are guaranteed to have some advantageous properties as described in the Appendix. Quasi generators provide the same sequences on different runs of the generator and, hence, they do not require a seed or an initial sequence from the user. Since quasi generators take into consideration the location of the previous points, they also expect that the sequence is startedfrom the beginning. These two properties make quasi generators easier to use, because the user only has to give the number of points to be generated and the dimension as parameters. At the same time, these properties may become a disadvantage if the user for some reason wants to obtain different sequences in the same dimension. On this occasion, the easiest way out might be to generate points in higher dimensions and then projectthem to the desired dimension, but this may have an effect on the distribution properties. The SSI process is not commonly available for an n-dimensional case, but the implementation is straightforward given that the user provides the PMMD value. However, it gets more complicated if the user provides only the number of points to be generated and theoptimalPMMD(with respect to the coverage)must beestimated automatically. This issue is discussed in more detail in Section 6. An interesting property of the SSI process is that the proportion of the two conicting objectives, that is, the genetic diversity and the good uniform coverage, can both be controlled using the probabilistic maximal minimum distance parameter PMMD. Note that, if desired, the SSI process provides different points on every run just like pseudo random number generators. The nonaligned systematic sampling processes are not commonly available in n-dimensions, but they are easy to implement and to use. However, when

J Glob Optim (2007) 37:405436

421

generating a relatively small number of points, it is practical to use the systematic sampling process only in small dimensions. Otherwise, the proportion of genuine nonaligned systematic sample points is small. This is a strong limitation in the area of optimization. 4.5 Summary of features An ideal generator for our purposes should generate well-distributed, genetically diverse points in n-dimensions, and it should be relatively fast and easy to use. In Table 2, we summarize the evaluation made in this section. The evaluation of different point generators has been marked using plus signs (+): the more plus signs the better the score. In case of a notable difference in how well a generator works for problems with small and large dimensions (here small denotes less than, say, ve), the occasions are scored separately (small/large). For example, the nonaligned systematic sampling scores +++/+ in coverage,whichmeans that itworkswell insmall dimensions and poorly in large dimensions. Note that only coverage and genetic diversitymay have a direct inuence on objective function values inoptimization since they are the properties of the points whereas speed and usability are properties of the generators. Considering just the properties of the points we notice in Table 2 that pseudo random points have good genetic diversity, but the worst coverage, and the SSI process produces points with good coverage, but only average genetic diversity. The properties of the Niederreiter quasi random points settle somewhere between pseudo and the SSI process. In the further analysis, we pay less attention to the nonaligned systematic sampling since it is not applicable for problems with several variables. To nd out the effects of the coverage and genetic diversity, we concentrate on the pseudo random number generator and the SSI process. 5 Experimentations We test the inuence of initial populations computationally by using different point generators and a largenumber of difcult test problems from the literature. The inuence of the different initial populations is analyzed after 10 and 20 generations and after the execution of the whole algorithm. The genetic algorithm used is presented in Section 2. 5.1 Test settings The test runs wereperformed on anHP9000/J5600computer.We solved a test suite of 52problems using geneticalgorithms described in Section 2 with theparameter values
Tab le 2 Summary of generator properties Coverage Properties (small/large) Speed Usability Genetic diversity + Pseudo +++ +++ +++ ++ Nieder ++ +++ +++ +++ SSI ++ + ++ +++/+ Nonalig + ++/+++ +++/+

422 Tab le 3 Parameter values for genetic algorithms Parameter Population size Elitism size Tournament size Crossover rate Mutation rate Max generations Steps To le ran ce

J Glob Optim (2007) 37:405436 Value 201 21 3 0.8 0.1 100 10, 20 and 2000 1 0 7

given in Table 3. The only difference between the variants of a genetic algorithm used was the way the initial population was generated. Each problem was solved 10 times with each algorithm, when the algorithms were let run until the stopping criteria were satised, and 100times when thealgorithms were stopped prematurely after 10 and 20 iterations. In each run, we recorded the best objective function value in the last population and study them in what follows. The names of the function families, number of variables, box constraints used and references are collected in Table 4. If only one interval is given for the box constraints in Table 4, then each variable was restricted to the same interval. The test problems are the same as used in [24]. They are divided into two problem sets according to the number of variables so that problems with 10 or less variables are here called small and others large. Originally, initial populationswere generated in altogether 21 differentways including 13 variants of the SSI process, 5 quasi random number generators, a nonaligned systematic sampling and a pseudo random number generator. The variants of the SSI process differed in the random number generator used when generating the trial points and in the metric used when computing the probabilistic maximal minimum distance PMMD. The quasi random number generators tested were the Niederreiter,
Tab le 4 # 13 47 811 12 1317 1822 23 24 2527 28 29 30 3134 3538 3942 4346 4749 5052 Summary of the test problems Name Dimensions 2, 5, 10 Rastrigin Schwefel Michalewicz 2 6, 10, 20, 30 6, 10, 20, 50 2 2, 6, 10, 20, 50 4 5, 2, 6, 6, 7 10, 20, 30 3 5 2 4, 6, 10, 20 4, 6, 10, 20 5, 5, 10, 10 3, 4, 10, 20 4, 4, 4 2, 5, 10 [0, ]constraints Box [600, 400] [500, 500] [5, 10][0, 15] 700, 500] [10, 10] [5, 5] 38.768] 30.768, [10, 10] 100, 100] [5, 5] [10, 10] [1, 1] [0, 10] [20, 20] 900, 1100] [0, 10] [0, ]

Ref. [13] [13] [13] [13] [13] [13] [13] [40] [40] [7] [7] [40] [10] [10] [41, 13] [42] [40] [41]

Branin rcos Easom Griewangk Levy Ackleys Path Levy P8 P16 Hansen Corona Katsuura Shekel Langerman Funtion 10 Epistatic Michalewicz

J Glob Optim (2007) 37:405436

423

Faure and Halton generator and two variants of the Sobol generator. The numerical results of the genetic algorithm with quasi random initial populations are reported in [23], where the Niederreiter generator performed the best. Here, we report the results of the genetic algorithm using initial populations generated by the good representatives from different types of point generators (see the beginning of Section 4). 5.2 Numerical results In this subsection, we present the results of the experiments described above. In what follows, we consider only the best objective function values that the algorithm has found (in the last population) and pay attention to the average and variance of these values in the repeated runs. Let us rst consider those tests where the geneticalgorithm was run until a stopping criterion was satised. Table 5 shows the average nal objective function values f (x) and the average standard deviations (f )for small and largeproblems.As we can see, for smallproblems, in the average objective function values, there are no differences in three signicant digits. However, for large problems the differences in average objective function values are quite large. Based solely on the average objective function values in Table 5 one could draw the conclusion that nonaligned systematic samples and Niederreiter quasi random points are clearly the most suitable for initial populations. This, however, is not completely true as our further analysis reveals, when we consider also the variances. Before any statistical tests, we make an observation on the results in Table 5. As mentioned in Section 4.2, nonaligned systematic sampling reduces to pseudo random sampling for large problems. However, the average objective function value and the average standard deviation of those algorithms differ quiteremarkably for large problems. This indicates that the average values are not good measures in the comparison. We applied analysis of variance (ANOVA) to the nal objective function values to nd out whether the differences were statistically signicant. The analysis of variance indicated that only in 4 out of 52 test problems there were statistically signicant differences (with95% condence).We call these four problems critical problems ,and they are the problems 40, 49, 51 and 52 in Table 4. Surprisingly, all the critical problems were small (problems with 10 or less variables). This can be explained by the small variance for these problems. The fact that there was no statistically signicant difference in the objective function values for any of the large problems is explained by their very large variances. The variances were the largest for the 20-dimensional Katsuura problem for which the variances were 8,473, 2,060, 509 and 17,100 for the pseudo, Niederreiter, SSI and nonaligned, respectively. The 20-dimensional Katsuura

Tab le 5 Average objective function values and standard deviations for small and large problems

Pseudo f (x)Small f (x)Large (f )Small ( f )Large 186.7 1896.0 0.34 885.9

Nieder 186.7 2205.4 0.28 355.3

SSI 186.8 1680.2 0.37 162.5

Nonalig 186.6 2295.9 0.58 1697.9

424

J Glob Optim (2007) 37:405436

problem biased the average values in Table 5 and was the main cause of differences for large problems. Next, we study the critical problems more closely. Figure 9 shows the average objective function values for the four critical problems. Ps, Ni, SSI and No stand for the pseudo, Niederreiter, SSI process and nonaligned systematic sampling, respectively, and the whiskers illustrate the standard deviations. In addition, the P values related to the F test of ANOVA [28] are given in the parenthesis subsequent to the problem number. The four plots reveal that there is no overall dominance between the geneticalgorithm variants, however the variant applying quasirandom points performed best in 3 out of 4 cases and also points generated by the SSI process performed better than pseudo random points in 3 out of 4 instances. Nonaligned systematic sampling performed very similarly to pseudo random points in three cases and was better in one. So far we have studied the differences in objective function values after running the whole algorithm and there has been only minor differences. Next, we consider also the situations, where the algorithm is stopped prematurely after 10 and 20 generations. Earlier, we noticed difculties in interpreting values that were not scaled: one large value could strongly bias the results. Therefore, we now dene a more reliable measure of performance by scaling the objective function values to the range from 0 to 1. Scaled average objective function value sf is dened as follows sf = f f min f max f min ,

Problem 40 (P=0.007) -0.6 -2 -6 -0.8 -10 -14 Ps Ni SSI No -8.8 Ps Ni SSI No Problem 49 (P=0.040)

-1

Problem 51 (P=0.001) -4.5

-9.2 -4.6 -9.6 -4.7 Ps Ni SSI No Ps Ni SSI No Problem 52 (P=0.043)

Fig. 9

Average objective function values and standard deviations

J Glob Optim (2007) 37:405436

425

where f min is the best objective function value known (or found after running all the four whole algorithms), and f max is the worst of the best objective function value found during the repeated runs (either for a xed number of generations or till a stopping criterion). It is worth noting that f max changes each time the stopping criteria are changed, but f min typically remains the same (at least if it is known). Finally, a bar above a function symbol denotes the average value over all the considered test min problems (small or large), where the difference between (unscaled values of) f and max f was more than 0.1. Table 6 shows the scaled average objective function values for small and large problems. We remind again that for large dimensional problems the nonaligned systematic sampling points and pseudo random points are essentially the same, and their results are here reported only for the sake of completeness. We now notice in Table 6 that pseudo and nonaligned systematic sampling receive similar values, which gives indication that the values are more reliable after scaling. The scaled average f min function value sf becomes unstable near theoptimum f max objective since the divisor approaches zero and, therefore, the values for 10 and 20 generations are more reliable than the values after running the whole algorithm. Nevertheless, Table 6 shows an interesting trend: the genetic algorithm variants that converge relatively fast in the rst generations are not necessarily the ones obtaining the best nal results. This is particularly true for the SSI process, which had the best empty space statistic values in Section 4 and in the numerical results in Table 6 receives worst values during the rst generations, but good values at the end. Pseudo random points, on the other hand, had the best genetic diversity, and, in Table 6, they receive good values during the rst iterations. The Niederreiter quasi random points had above average genetic diversity and above average coverage and their performance in numerical tests in Table 6 is above average except for one instance. Despite the observations made above, the differences in the results shown in Table 6 are debatable. We applied ANOVA also for the intermediate results after 10 and 20 generations. Again, we call those problems critical, where there were statistical differences in the objective function values. However, we now exclude the problems, where the differences in objective function values were less than 0.1. There were all together 38 critical problems after 10 generations and 9 after 20 generations. In Table 7, we report the number of critical problems for which each algorithm had the best average intermediate objective function value. The differences are not large, but pseudo random (and results. nonaligned systematic sampling) initial population seem to provide best intermediate

Tab le 6 Scaled average objective function values after all Generations all 10 20 10 20

10 and 20 generations and after running the whole algorithm

sf (x)Small sf (x)Small sf (x)Small sf (x)Large sf (x)Large

0.90 0.85 0.52 0.84 0.91 Pseudo 0.45

0.91 0.88 0.38 0.85 0.90 Nieder 0.61

0.9

0.87

SSI 0.95 0.43 0.99 0.99

Nonalig 0.89 0.57 0.84 0.87

sf (x)Large

0.38

0.42

426 Tab le 7 The number of problems with best average objective function values for critical problems Generations 10 20 Pseudo 9 5

J Glob Optim (2007) 37:405436 Total Nieder 9 1 SSI 8 0 Nonalig 12 3 38 9

Before summarizing the results, let us yet consider the magnitudes of the variances in the best objective function values separately from the actual objective function values. In Table 8 are reported the number of test problems for which each algorithm had the smallest variance after 10 and 20 generations and after running the whole algorithms. We have excluded the problems where the differences in variances were less than 0.1. The total number of problems considered is reported in the last column of Table 8. It is noteworthy that the variant of genetic algorithm using pseudo random numbers has most often the smallest variance for the intermediate solutions, which was not expected. 5.3 Summary of numerical results Here, we summarize the analysis made above.When considering nal and intermediate average objective function values, the analysis of variance indicated four critical problems for the nal results and many more for the intermediate results where there were signicant differences in the best objective function values found. None of the methods clearlydominated the others, but in the scaled objective function values there was a noticeable trend, namely that pseudo random initial populations provided on the average good intermediate results, but not so good nal results. The converse was true for the SSI process and quasi random points with better uniform coverage. Earlier we noticed that some clusterizations were harmful and some benecial. Hence, genetic algorithms using pseudo random numbers and the variants where the initial populations have better uniform coverage performed in a slightly different way. Nevertheless, the differences, in general, were debatable and no strong conclusions could be made. As far as the magnitudes of variances in the intermediate results are concerned, in most of the cases, the variance was the smallest for the genetic algorithm using pseudo random numbers. There may be both harmful and benecial clusterizations as seen in Figs. 1 and 2. Harmful clusterizations lead to inferior and benecial lead to superior objective function values compared to initial populations without clustering. One could have expected this to imply more signicant differences in the magnitudes of variances in the early generations but this did not happen. However, again, none of the algorithms clearly dominated the others and the differences were not large. Finally, there were no signicant differences in the magnitudes of variances related to nal objective function values when applying clustered or unclustered initial popuTab le 8 Number of test problems when the variance Total 10 20 Generations Final 18 12 Pseudo 5 11 11 Nieder 9 8 6 SSI 3 9 15 Nonalig 2 46 44 19

was smallest

J Glob Optim (2007) 37:405436

427

lations.This could be assumed keeping inmind the observation that geneticalgorithms areMarkov chains and, hence, converge independently of the initial population.However, the conclusion is not trivial, since actually theMarkov chain property only guarantees convergence when thenumber of generations approaches innity. The last row in Table 8 shows that the geneticalgorithm applying pseudorandom initial population has more often smallest variance than the one applying the SSI process, but less often than the one applying quasi random sequences. Hence, the clusterization of the initial population did not seem to have strong effect on the variance of the nal objective function values. Despite some differences, the results indicate that pseudo random points perform rather well when used in initial populations of genetic algorithms. Moreover, if the minor differences found in this research can be generalized, then they speak for pseudo random points in cases when the algorithm is stopped after relatively small number of generations, which may be the case with some computationally expensive real life problems. 6 Discussion In this section, we consider fourproblematic issues that we have touched earlier. First, we point out that dimensionality plays a large role in this research. The initial populations become sparse when the optimization problem has more than few variables. A simple example of this can be seen with unit hyper cubes in three and two dimensions. In three dimensions, eight point can be located in the corners of the cube so that their maximal minimum distance is always one. But locating eight points in the line segment of length one means that the maximal minimum distance between the points is 1/7. The sparsity in higher dimensions means that solutions in the initial population are likely to be relatively far from the global optimum no matter what uniform distribution pattern is used. Moreover, in sparse populations, when the minimum distance between two adjacent points in the population is maximized, that is, when the points in the population try tomaximally avoid each other, then the population is likely to be concentrated to the borders of the feasible region, whichmay not always be desirable. This phenomenon is typical of the SSI process and could also be seen in the example mentioned above. The second issue is of more technical nature and concerns the SSI process. The SSI process can produce points that are either genetically diverse or have good uniform coverage, depending on the distance parameter PMMD. We optimized the coverage, but did not test values forPMMD that would compromise between the two conicting objectives. Smaller values for would also make the process considerably faster. For example, in Fig. 10a we have seven point in an interval of length one with = 1/7. We know that one more point can be included but how many random points should be generated before the right location is found? When optimizing the coverage for the SSI process the automatic determination of a maximal minimum distance PMMD is problematic. We did not nd good theoretical estimates for PMMD and the theoretical estimates we tested did not work well
Fig. 10 Difculties with speed and approximation of

a)

b)

428

J Glob Optim (2007) 37:405436

in practice. Hence, we used experimental estimates. In our current implementation, we have numerically solved maximal values of PMMD for some number of points in some dimensions, and we approximate maximal PMMD elsewhere using interpolation. Examples of values used for in our experiments for different numbers of variables (with population size 200) include 0.032 (for n = 2), 0.23 (for n = 5), 0.49 (for n = 10), 0.64 (for n = 20), 0.79 (for n = 50) and 0.84 (for n = 100). Some further research is needed before the SSI process with maximal values for PMMD can be used in practical applications.Anexample of the difculty of approximating the value of is given in Fig. 10b. Let us assume that we have xed = 1 / 7 a nd t h e d istance between the end points of the interval and outermost points is 1 / 8 a nd t h e d istance between intermediate points is 2 /8. Here we have only four points (denoted by black dots in the gure) and cannot t any more points even though eight points could be t in the interval in the optimal case (denoted by circles in the gure). In other words, it is very hard to determine the maximal minimum distance a priori when points are generated randomly. The third issue concerns the speed of point generators.We point out that the computational complexity of an algorithm may depend strongly on the implementation, see, e.g., [4], where bit-operations are considered. Thus, the results on the speed of the generators can be considered only suggestive. The last issue concerns the testing of point generators. There exists a large variety of dynamic statistical tests for the independence of points generated by point generators [14]. These tests are called goodness-of-t tests. One well-established battery of goodness-to-t tests is called DIEHARD and the source code is available at [8]. However, the goodness-to-t tests are designed to test the independence of large numbers of points. For example, DIEHARD requires a binary le of a size 1011 megabytes to evaluate a generator. Since we are only interested in a few hundred numbers at the beginning of the sequence, we used the Ripleys K-function instead of the goodness-to-t tests. 7 Conclusions and future research In this paper, we have tested different initial populations for a real coded genetic algorithm. Traditionally, pseudo random numbers are used for generating initial population, and our motivation has been to study and initiate discussion on whether their use is justied. We have shown with asimple, academic,example that initial populationsmay have a signicant effect on the best objective function value over several generations. Then we have concentrated on studying different realistic ways to generate initial populations for a case with no a priori information on the location of the global minima. We have briey summarized the basic properties of a pseudo and a quasi random sequence generator, the SSI process and the nonaligned systematic sampling and applied them. We have discussed their properties including genetic diversity and a gooduniform coverage for the points as well as speed and usability for the generators. In thenumerical tests with geneticalgorithms, we have used a test suite of 52 functions from the literature.We have studied the effects of the different initial populations on the best objective function values after 10 and 20 generations and after running the whole algorithm.

J Glob Optim (2007) 37:405436

429

There were differences in the coverage and genetic diversity of the tested point sets. The SSI process has a good uniform coverage, but only average genetic diversity, and for pseudo random points it is vice versa. The Niederreiter quasi random points have both above average coverage and above average genetic diversity. In the numerical experiments with the genetic algorithms, there was a trend although weakshowing that the versions of genetic algorithm with good genetic diversity converged fast during the rst generations, but did not obtain the best nal objective function values on the average. The converse was true for the versions with good uniform coverage. However, the differences were not so large that any strong conclusion could be drawn. With respect to the speed and usability of the point generators the results show that pseudo and quasi random sequence generators are fast and easy to use. Both the SSI process and the nonaligned systematic sampling require more developing and testing. We conclude that, based on our research, the traditional way to generate initial populations of genetic algorithms using pseudo random number generator was not worse than the others. It is particularly well suited to cases where the algorithm must be stopped prematurely; which may happen with computationally expensive real life problems. However, there are also good alternative ways such as quasi random sequences and the SSI process, which have advantages in certain cases, especially if the goodness of the nal solution is valued higher that the speed of convergence during the rst generations. Our ndings show that paying attention to the initial population may have an effect on the success of the genetic algorithm and further research and discussion is encouraged. The topics for future research include, rstly, to study different initial populations for specic types of problems and with different genetic algorithm parameters like population size and maximum number of generations. Secondly, to study more closely the problem of dimensionality discussed in Section 6, and thirdly, to further develop the SSI implementation and to discover good theoretical estimates for the maximal minimum distance PMMD.
Acknowledgements The authors thank doctors Salme Krkkinen andMarkoM.Mkel for useful discussions and professor Michael Mascagni for providing the Niederreiter generator. This research was supported by the Jenny and AnttiWihuri Foundation and the Academy of Finland, grant number 102020.

Appendix A This appendix is designed to give a short overview to different pseudorandomnumber and quasi random sequence generators. In the following denitions we use (y)modM to denote y modulo M. A.1 Some pseudo random number generators Traditionally, pseudo random number generators produce sequences of scalars. If vectors are needed, then they are usually formed by taking the rst n scalars for the rst n-dimensional vector, the next n scalars for the second vector and so forth. According to [30], it would be preferable to generate pseudo random vectors directly. Here, we present traditional pseudo random number generators and one vector generator.

430

J Glob Optim (2007) 37:405436

We classify pseudo random number generators to two main categories: congruential and recursive generators.We call amethod congruential, if it usesmodulo and only the previous iteration value, andwe call it recursive, if it uses values from several previous iterations. In addition, we present a feedback shift register generator and a vector generator called SQRT-generator, which do not fall into the two main categories. For another classication, see, e.g., [39]. The classications are always somewhat vague. Feedback shift register generator also uses values from several previous iterations, but its construction differs signicantly from recursive generators, and therefore, we put it in its own class. For more detailed information about pseudo random number generation we refer to [14] and to references therein. A.1.1 Congruential generators Next, we present ve congruential generators. They are called linear, quadratic, inversive, additive and parallel linear congruential generators. The name congruential comes from the use of modulo. For example, in modulo 5, the numbers 4 and 9 are called congruent. In what follows, M is themodulo and a j s are prescribed integers.On the ith iteration, the generators rst produce an integer y i [ 0, M ) and then arandom number x i [ 0, 1)is obtained by a division xi = yi /M unless another formula is given. 0 The seed y is a large prescribed integer. An additive congruential generator (of kth 0 order) [45], to be described below, requires k i + 1 seeds 0 yj < M, j = 0, ... , k ,a nd the parallel linear generator includes three separate linear generators denoted by y , i i y and y . Linear y 2 ) mod M 1y i = (a i1 + a Quadratic y i = (a 1 (y i1 ) y i = (a 1 Inversive
i 1 y 0 = y0 i i y i 1 2 y i1 2y

+ a 3 ) mod M
i 1

+a

xi Additive
i m

yi k M

+yi 1a ) mod M +

= ( (a 1 1 i1 = ym
2

i = (a 3 y y i 1 Parallel linear
y y + xi = (M
1 i

1 M , m = 1, ..., k m ) mod ) mod M 2 ) mod M 3 yi M2

yi M3

)mod 1.

mod M a generator with a =0 is called a congruential generators with a 2 = 0 a ndya 2 =) 0. Then 2 yi = (a yi1
2

In all of the above generators i = 1, 2, ... Often, a distinction is made between linear
i multiplicative linear congruential generator and a generator with a =0 a mixed linear j congruential generator.

A.1.2 Recursive generators

Generators using two or more previous random numbers when generating a new number in the sequence are here called recursive generators. In what follows, a , c , s,

J Glob Optim (2007) 37:405436

431

and r are prescribed integers s < r, j

= 1, ... , k , a nd ci is called the carry operator on the ith iteration. The determination of c is omitted for the multiply-with-carry-generator and also for the add-with-carry and substract-with-borrow generators usingmore i = yi that two previous values. Again, the pseudo random number x i [ 0, 1)is obtained by a division x /M. For more details about the carry operator, see, e.g., [6]. Next, we present ve recursive generators. ik y ) mod M 1y ky Multiplicative recursive y )modM

Lagged Fibonacci

= ( (a 1 y+ r ++ y i s i + i c ia y ii = 1, y ci =
i

= (y i r y i s 1, if y is y i r c i < 0 ) m o d M , w here is + y i r + c i 1 M + a 2 y i r + c i otherwise


i1

Add with carry

ci = y

0, 1y

is
i

Substract Again i with borrow yi = (yis y i r c i ) m o d M , w here 0, Multiply with carry


i

otherwise

= (a is computed using c

) mo d M , w here c and previous values of y .

= 1, 2, ... and y0 is the seed. The number of previous values of y used in the j of zeros and ones. methods described above may alter. In addition, the lagged Fibonacci generator may generate new values alsoi as a sum or a product of the previous values. j

+ + a 1 z + ap ) 1 A.1.3 Feedback shift register generator

p1

1)

Feedback i shift register generator uses modulo in a different manner i compared to the congruential and recursive generators. Modulo M is not a large integer, but very often M = 2 in which case the generator produces a sequence The sequence is then cut into length. These subsen p subsequences n p + 1 with an appropriate 1 n + + ) m o d n M . ( 2) pa quences correspond to yp s. To generate the sequence 1 , rst a primitivepolynomial q ( z ) of order p is selected 1 p [ 0, M ) can be selected arbitrarily. q ( z ) = z (a z where a {0, 1, ... ,M 1 } , i = 1, ... , p. Then using the coefcients a , t he nth num-

ber in the sequence is generated using the formula = (a +a The initial numbers , ... ,
1

432

J Glob Optim (2007) 37:405436

A.1.4 SQRT sequence The following denition of the SQRT sequence is taken from [43]. Let p i now be the ith prime number and z i = p i , i = 1, ... , n. Then the n-dimensional SQRT sequence { x i } is dened by x i = (iz) = (iz 1 , ... , iz n )mod 1, where themodulo is takencomponent-wise.Despite itssimplicity, the SQRT sequence performs well in [43], where it is applied to a numerical integration problem and compared with ve quasi random sequences. A.2 Some quasi random sequence generators To describe the structure and the distribution properties of some quasi random sequences let us dene an elementary interval in base b
n

E =
i=1

a i a i +1 , b di bdi

where a i , d i and b are integers d i 0, 0 a i < b d i n Rn


di 1 2 1 2

for i = 1, ... , n. Thus, E is a

subinterval of an n-dimensional unit cube I and its ith side has a length of t 1/b . Figure 11 illustrates 2-dimensional elementary intervals with base b = 2 a nd tm 2 a = 2. d = d = 2, and the colored area corresponds to values a = 1 a nd Quasi random sequences are called (t, s)-sequences if they satisfy the following distribution property (s is the dimension that we denote by n). For all integers k 0, m the point set { x j } of (t, s)-sequence with kbm j < ( k + 1)bm has exactly b points on every elementary interval in base b with volume b (in L -metric) [36]. The distribution properties for a (t, s)-sequence are the most preferable, when the distribution parameter t = 0, because then the rst bm successive points divide evenly on the corresponding elementary intervals and the same is true for the following b successive points and so on. 1 The class of (t, s)-sequences was introduced by Sobol in 1966 (see, [36]). He called them LP -sequences ( t) and studied them in base 2. Faure generalized the
Fig. 11 Two-dimensional elementary intervals

J Glob Optim (2007) 37:405436

433

(t, s)-sequences to arbitrary prime base b 2 (see, e.g., [36]). Finally, Niederreiter generalized (t, s)-sequence to arbitrary base b 2. Often a (t, s)-sequence is called the Sobol sequence if b = 2 a nd t = 0, and Faure sequence if b is a prime and t = 0 [30]. Sequences that are constructed according to the guidelines given in [30] are called Niederreiter sequences. For a general construction of any (t, s)-sequence, see, e.g., [20, 30]. Next, we present some examples how to construct (t, s)-sequences. We omit all implementational details. A.2.1 Van der Corput sequence Let b 2 be the base and k be an integer. If we write k in base b k = (d j ...d 1 ) b , where d i { 0, ... , b 1}, i = 1, ... , j, and dene a radical inverse function b by dj d1 + + j (1 3) j b b b then Van der Corput sequence in base b is the sequence {b ( k ) } k = 0 . Informally, we may say that in the radical inverse the numbers d j , ... , d 1 are reected using the decimal point as a reector, for example, if k = 12 = (1100)2 , t hen 2 ( 12 ) = 0.0011 2 .Van der Corput sequence is a 1-dimensional (t, s)-sequence with t = 0 (see, e.g., [30]). = (0.d ... b ( dk ) )= , A.2.2 Hammersley sequence All but the rst component of the n-dimensional Hammersley sequence are van der Corput sequences in an appropriate base. The n-dimensional Hammersley sequence is dened as x i = (i(/iN (i) , ... , b,N )), ib 1= 0, 1, ... n 1 1 ( 4)

where N is the number of points to be generated and the bases b 1 , ... , b n are integers greater than one and pairwise prime. A.2.3 Halton sequence A Hammersley sequence without the rst element is called Halton sequence [17] x i... = . (b 1 (i), ... , b n (i)), i = 0, 1, ( 5)

The advantage over the Hammersley sequence is that a Halton sequence does not require the knowledge of N, the total length of the sequence. A.2.4 Sobol sequence We construct the Sobol sequence following [4] and [14]. Let us rst consider 1-dimensional Sobol sequence, for which we need to create a set of direction numbers v i in base 2. To create the direction numbers v i , we use coefcients of a primitive polynomial q in the eld of binary numbers (compare to Eq. 1 in Sect. 7) + +a
q 1( zz ) += a

z. p + a 1 z p1

p(

6)

434

J Glob Optim (2007) 37:405436

Then, employing the bitwise binary exclusive-or operator we derive the direction numbers from the formula v i = a 1 v i1 a 2 v i2 a p v ip v ip , i > p. 2p

The ith recurrence in 1-dimensional Sobol sequence is now formed as x i = b 1 v 1 b 2 v 2 b 3 v 3 , where b 3 b 2 b 1 is the binary representation of i (for example, if i = 1011, then b 4 = 1, b 3 = 0, b 2 = 1 a nd b 1 = 1). To generate an n-dimensional sequence, it is sufcient to choose n different primitive polynomials and calculate n different sets of direction numbers. For more details, see [4]. A.2.5 Faure sequence The followingexample of Faure sequence ismainly from [43].Let us dene a function g g(b ( k )) =
b 1

c0 cj 1 , b + + b j
j b

where b is the base, ( k ) = (0.d ... d ) is an element of the Van der Corput sequence and j h! cm = d h mod b.
n=m

m(h m)! (b (i))),

Now, an n-dimensional (t as 1 i , s)-sequence is dened n x = ( (i), g( (i)), ... , g


b b

and a Faure sequence is obtained when b is the smallest prime larger or equal to the dimension n [36]. References
1. Acworth, P., Broadie, M., Glasserman, P.: A comparison of some Monte Carlo and quasi Monte Carlo techniques for option pricing. In: Niederreiter, H., Hellekalek, P., Larcher, G., Zinterhof, P. (eds). Monte Carlo and Quasi-Monte Carlo Methods 1996, number 127 in Lecture notes in statistics, pp. 118. Springer-Verlag, New York (1998) 2. Ali, M.M., Storey, C.: Modied controlled random search algorithms. Int. J. Comput. Mathemat. 53, 229235 (1994) 3. (1994) Ali,M.M., Storey, C.: Topographical multilevel single linkage. J. Global Optimizat. 5(4), 349358 4. Bratley, P., Fox, B.L.: Algorithm 659: implementing Sobols quasirandom sequence generator. ACM Trans. Mathemat. Software 14(1), 88100 (1988) 5. Bratley, P., Fox, B.L., Niederreiter, H.: Implementation and tests of low-discrepancy sequences. ACM Trans. Model. Comput. Simulat. 2(3), 195213 (1992) 6. Couture, R., LEcuyer, P.: Distribution properties of multiply-with-carry random number generators. Mathemat. Computat. 66(218), 591607 (1997) 7. Dekkers, A., Aarts, E.: Global optimization and simulated annealing.Mathemat. Program. 50 (3), 367393 (1991) 8. DIEHARD: a battery of tests for random number generators developed by George Marsaglia. http://stat.fsu.edu/~geo/diehard.html , September (2004). 9. Diggle, P.J.: Statistical Analysis of Spatial Point Patterns. Academic Press, London (1983)

J Glob Optim (2007) 37:405436

435

10. Dykes, S., Rosen, B.: Parallel very fast simulated reannealing by temperature block partitioning. In Proceedings of the 1994 IEEE International Conference on Systems, Man, and Cybernetics, vol. 2, pp. 19141919 (1994) 11. Eiben,A.E.,Hinterding,R.,Michalewicz,Z.:Parameter control in evolutionaryalgorithms. IEEE Trans. Evol. Computat. 3(2), 124141 (1999) 12. Floudas, C.A., Pardalos, P.M. (eds.): RecentAdvances inGlobalOptimization. PrincetonUniversity Press, Princeton, NJ (1992) 13. Genetic and evolutionary algorithm toolbox for use with Matlab. http://www.geatbx.com/, June (2004) 14. Gentle, J.E.:RandomNumberGeneration andMonteCarloMethods. Springer-Verlag,NewYork (1998) 15. Glover, F.,Kochenberger,G.A. (eds.):Handbook ofMetaheuristics.KluwerAcademic Publishers, Boston (2003) 16. Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. AddisonWesley, New York (1989) 17. Halton, J.H.: On the efciency of certain quasi-random sequences of points in evaluating multidimentional integrals. Numerische Mathematik, 2, 8490 (1960) 18. Horst,R., Pardalos, P.M. (eds.):Handbook ofGlobalOptimization.KluwerAcademic Publishers, Dordrecht (1995) 19. Kingsley, M.C.S., Kanneworff, P., Carlsson, D.M.: Buffered random sampling: a sequential inhibited spatial point process applied to sampling in a trawl survey for northern shrimp pandalus borealis in west Greenland waters. ICES J. Mar. Sci. 61(1), 1224 (2004) 20. Larcher, G.: Digital point sets: analysis and applications. In: Hellekalek, P., Larcher, G. (eds.) Random and Quasi Random Point Sets, number 138 in Lecture Notes in Statistics, pp. 167222. Springer-Verlag, New York (1998) 21. Lei,G.:Adaptive random search inquasi-Monte Carlomethods for globaloptimization.Comput. Mathemat. Appl. 43(67), 747754 (2002) 22. Lemieux,C., LEcuyer, P.:On selection criteria for lattice rules and otherquasi-Monte Carlo point sets. Mathemat. Comput. Simulat. 55(13), 139148 (2001) 23. Maaranen, H.: On Global Optimization with Aspects toMethod Comparison and Hybridization. Licentiate thesis, Department of Mathematical Information Technology, University of Jyvskyl (2002) 24. Maaranen, H., Miettinen, K., Mkel, M.M.: Quasi-random initial population for genetic algorithms. Comput. Mathemat. Appl. 47(12), 18851895 (2004) 25. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag, Berlin (1994) 26. Michalewicz, Z., Logan, T.D., Swaminathan, S.: Evolutionary operators for continuous convex parameter spaces. In: Sebald, A.V., Fogel, L.J. (eds.) Proceedings of the 3rd Annual Conference on Evolutionary Programming, pp. 8497.World Scientic Publishing, River Edge, NJ (1994) 27. Morokoff, W.J., Caisch, R.E.: Quasi-random sequences and their discrepancies. SIAM J. Sci. Comput. 15(6), 12511279 (1994) 28. Neter, J.,Wasserman,W.,Whitmore,G.A.:Applied Statistics.Allyn andBacon, Inc.,Boston, third edition (1988) 29. Niederreiter, H.: Quasi-Monte Carlo methods for global optimization. In: Grossmann,W., Pug, G., Vincze, I., Wertz, W. (eds.) Proceedings of the 4th Pannonian Symposium on Mathematical Statistics, pp. 251267 (1983) 30. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia (1992) 31. Nix, A.E., Vose, M.D.: Modeling genetic algorithms with Markov chains. Ann. Mathemat. Arti. Intell. 5(1), 7988 (1992) 32. Pardalos, P.M.,Romeijn,H.E. (eds.):Handbook ofGlobalOptimization, vol. 2.KluwerAcademic Publishers, Boston (2002) 33. Press, W.H., Teukolsky, S.A.: Quasi- (that is, sub-) random numbers. Comput. Phys. 3(6), 7679 (1989) 34. Ripley, B.D.: Spatial Statistics. John Wiley & Sons, New York (1981) 35. Snyder, W.C.: Accuracy estimation for quasi-Monte Carlo simulations. Mathemat Comput. Simulat. 54(13), 131143 (2000) 36. Sobol, I.M.: On quasi-Monte Carlo integrations. Mathemat. Comput. Simulat. 47(25), 103112 (1998) 37. Sobol, I.M., Bakin, S.G.: On the crude multidimensional search. J. Computat. Appl. Mathemat. 56(3), 283293 (1994)

436

J Glob Optim (2007) 37:405436

38. Sobol, I.M., Tutunnikov,A.V.:A varience reducingmultiplier forMonteCarlo integrations.Mathemat. Comput. Model. 23(89), 8796 (1996) 39. Tang,H.-C.:Using an adaptive geneticalgorithmwith reversals to nd good second-ordermultiple recursive random number generators. Mathemat. Methods Operat. Res. 57 (1), 4148 (2003) 40. Testproblems for globaloptimization. http://www.imm.dtu.dk/~km/GlobOpt/testex/ , June (2004) 41. Test problems in R 2 . http://iridia.ulb.ac.be/~aroli/ICEO/Functions/Functions.html , June (2004) 42. Trafalis, B., Kasap, S.: A novel metaheuristics approach for continuous global optimization. J. Global Optimizat. 23(2), 171190 (2002) 43. Tufn, B.: On the use of low discrepancy sequences inMonte Carlo methods.Monte CarloMethods Appl. 2(4), 295320 (1996) 44. Weisstein, E.W.: circle packing. From MathWorldA Wolfram Web Resource. http://mathworld.wolfram.com/CirclePacking.html, September (2004) 45. Wikramaratna, R.S.: ACORNa new method for generating sequences of uniformly distributed pseudo-random numbers. J. Computat. Phys. 83(1), 1631 (1989) 46. Wright, A.H., Zhao, Y.: Markov chain models of genetic algorithms. In: Banzhaf, W., Daida, J., Eiben, A.E., Garzon, M.H., Honavar, V., Jakiela, M., Smith, R.E. (eds.) GECCO-99: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 734741.Morgan Kauffman, SanMateo, California (1999)

You might also like