Professional Documents
Culture Documents
Steph Thesis
Steph Thesis
by
College of Science
University of the Philippines
Diliman, Quezon City
April 2007
CERTIFICATION
This is to certify that this undergraduate thesis entitled, Characteriza-
tion of the Penna Model by Simplex Projection Method and submitted
by Stephanie Banares Ibo to fulfill part of the requirements for the degree
of Bachelor of Science in Physics was successfully defended and approved on
March 27 2007.
i
ABSTRACT
ii
Table of Contents
Abstract ii
1 Introduction 1
3 Methodology 12
3.1 Penna Model Implementation . . . . . . . . . . . . . . . . . . 12
3.2 The Simplex Projection Method . . . . . . . . . . . . . . . . . 13
3.3 Penna and SPM parameter correlation . . . . . . . . . . . . . 16
5 Conclusion 39
iii
A Programming Details 43
A.1 Independent Penna Population Generation . . . . . . . . . . . 43
A.2 Distribution and Period of the Random Number Generator . . 44
A.3 Simplex Projection and Statistical Evaluation . . . . . . . . . 45
iv
Chapter 1
Introduction
1
In this work, the simplex projection method is used to characterize the
Penna model. This is done with the use of the parameters that represent the
method of reconstruction (SPM) and the model itself(Penna). The parame-
ters of the method and the model are correlated. The SPM mainly has two
parameters: embedding dimension, e and time delay or lag time, while the
Penna model has three primary parameters (birth rate, b; mutation thresh-
old, th and reproductive age, r) which represent the population generated.
the SPM parameters are chosen such that it gives the most accurate recon-
struction.
Owing to the complexity of the model itself, chaotic regimes occur within
the population generated by the Penna model. These chaotic regimes depend
on the Penna parameters that defines the population. With this idea, the
correlation between the Penna and the SPM parametes could be done in two
regimes: chaotic and non-chaotic. The behavior of the SPM parameters are
observed as the Penna parameters are varied corresponding to each regime
since variation of the Penna parameters causes the chaoticity within the
population.
2
Chapter 2
3
distinguish chaos from random behavior in time series of 10d , where d is the
dimension of the attractor. Hence, with this new algorithm, low dimensional
chaotic systems may be detected with only the smallest amountof data, say
1000 data points. It is with this argument that we can justify the use of a
relatively short time series (of 1000 data points) to provide accurate forecasts.
These new algorithms provides accurate short-term forecasts if the time
series is found to be chaotic, and are therefore loosely termed time delay
prediction methods. The method of forecasting varies: in Casdagli [9] the
forecasts are based on parametric linear regression; Nychka et. al. [10] use
nonparametric regression to find consistent estimates of the Liapunov expo-
nents; Sugihara and May [11] suggest a non-parametric simplex projection
method ; Alexander and Giblin [12] modified the Sugihara and May algo-
rithm to use barycentric coordinates. In all of these methods, if the system
is chaotic the correlation between actual and forecasted values will decline
as the number of points to predict increases, and very short term forecasts
will be quite accurate. On the other hand, if the system is purely random no
such decrease in prediction accuracy will be evident, and for many financial
returns series even the one step ahead predictions will be uncorrelated with
the actual returns.
There a number of ways that can be done to measure the degree of chaos
within non-linear systems. Such methods include Poincare sections, Lya-
punov Characteristic equations (LCE), and fractal dimensions [13]. However,
the problem of determining from experimental measures quantities such as
LCEs or fractal dimensions is entirely different from measuring these quan-
4
tities in a mathematical (numerical) investigation. Fortunately, there is a
partial answer to this problem that has been applied successfully to a large
number of experimental investigations [14, 15]. The key idea is to replace
the phase space trajectory such as:
where y(t) is any one of the phase space variables xi (t) or a functional
combination of these variables. Thus from a set of measurement of a single
quantity, y(t), a sequence of points such as follows can be constructed in the
artificial phase space:
5
dimension of the artificial phase space is large enough the correlation dimen-
sion saturates and becomes constant. The minimum value of m for which
the correlation dimension saturates and becomes constant becomes mo and
we have e = mo + 1 which is referred to the embedding dimension, e. The
quantity, e, represents the minimum dimensionality of the artificial phase
space necessary to include the attractor.
The general idea discussed concerning phase space trajectory reconstruc-
tion has been developed to provide accurate trajectory reconstruction. In
this work, the method by Sugihara and May was implemented; i.e., the Sim-
plex Projection method. Within this method, the choice of m and t is
translated to the choice of the SPM parameters: e, embedding dimension
and the lag time, .
6
on simulating population ageing; owing this to the models simplicity yet
undoubted crucial role in representing1 the genome of an individual from a
given specie population.
An individuals genome is represented by a bit string2 which could have
a value of 1 or 0. A healthy gene would be represented by 0 and a bad or
mutated gene would, on the other hand, by 1. An individual primarily has a
set of healthy bit string; that is a bit string comprised by zeroes. There are
several important parameters that define a Penna population:
V = 1 N(t)/K (2.5)
The Verhulst factor takes into account competition for resources such
as space and food which rooted from the concept of the carrying capacity,
1
with computer bit strings(for which the original model used 32 bits)
2
In this work, 8-bit string implementation was used.
3
In most works, only deleterious or harmful mutations [22, 23] are considered since
these mutations occur more frequently in nature than those of positive or beneficial mu-
tations [24, 25]
4
having th = 8 is the same as eliminating the effect of mutation threshold since the
maximum lifespan is 8 which is equal to the bit string length used.
5
the sustainable population size or the maximum number of individuals that an envi-
ronment can support for a long time [26]
7
K; that there are finite resources within environment which would call for
competition within the population.
This equation also takes into account competition for space and food
among other resources. The Verhulst factor is implemented in the Penna
model with the use of the random death procedure. During each time step,
a random number between zero and one is generated and will be compared
with V . If it is greater than V , the individual dies independently of the
age and genome. Otherwise, it continues. This procedure is what has been
termed as the random death procedure.
The need for the random death procedure through the implementation
of the Verhulst factor stems from the limited capabilities of different ageing
models including the Penna model to represent the complex nature of the
life and death cycles in different populations found in nature [27]. Although
the random death procedure is a vital part of the Penna model, there is no
biological justification for the procedure. This is so because the population
generated with such an implementation has for all its individuals an equal
survival probability. Simply put, all individuals, however fit, will die with
equal probability; this is not observed in real systems. It is with this argu-
ment that Martins and Cebrat [27] based their suggested modification on the
random death procedure.
The modified random death procedure is that the Verhulst factor will
be applied only to the newborns. This takes into account the ability of
older individual to adapt to their environment. This implementation, also,
is supported by the theory of natural selection wherein fitter individuals
has higher probability at survival6 . It is with this modification that the
limiting process of the random death procedure through the Verhulst factor
6
Some studies have shown that population saturation occurs at advanced ages by in-
troducing a Fermi survival function [28]
8
is maintained. The populations generated with this modified implementation
are termed as the VB populations while the original implementation is termed
as the VA implementation. In this work, the VB implementation is utilized.
Return maps [30] are graphical tools used to illustrate the attractors of
a dynamical system. It is obtained by plotting the xn vs. xn+z values where
z = 1, 2, 3...7 for a set of data such as that of the population through time
generated by the Penna Model. Through return maps, the attractors will be
seen and appreciated easily. Chaos is represented by multiple attractors/fixed
points as shown by xn vs xn+1 plots (first return maps, z=1).
In the Penna model Return maps were able to show the stability of the
system through varying parameters. For example from a study done, it was
seen that as the value of the birth rate b is increased8 , high fluctuations
occur thus suggesting chaotic regimes for that population. This is as verified
by the return map shown in figure( 2.1). For b=1, we see that there is a
single attractor and thus the system is stable. For b=6, there are multiple
attractors and thus, suggests a chaotic regime.
In the same study, it was shown that for high threshold values, high
fluctuations occur and thus suggests chaotic regimes. This was also verified
by the return map shown by figure ( 2.2). At th=8, multiple attractors occur
suggesting a chotic regime.
7
The first return maps for which z = 1 is enough to be able to display the attractors
of the system and thus measure the stability of the system
8
even at the value of b=2, fluctuations become prominent
9
2.2
1.8
1.6
x(n+1)
1.4
1.2
0.8
0.6
0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
x(n)
Figure 2.1: Return maps for the 8-bit VB population. Parameters are b = 1, 6
(* and squares respectively), r = 2, th = 2.[30]
1.1
1.05
0.95
x(n+1)
0.9
0.85
0.8
0.75
0.7
0.65
0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1
x(n)
Figure 2.2: Return maps for the 8-bit VB population with a high th. Pa-
rameters are b = 1, r = 2, th = 1, 8 (solid squares and empty circles
respectively).[30]
10
2.4 Periodicity within the Penna Model through
Age demographics
In population models, demography has been an important field of study
due to its effect on the growth and decline of the populations. In particular,
the Penna Model is a population model based on the theory of senescence
wherein at advanced ages, deleterious mutations are more likely to mani-
fest. Such mutations has adverse effects later in life upon an individual and
therefore on a populations survival and fertility [2]. It is with this idea that
age structure becomes an important feature of demography. Gompertz law
of exponential increase in mortality demonstrates an exponential decrease of
the population with respect to the populations component ages. An impor-
tant demographic feature related to age structure in population models is
the generation time, the time between birth and the time that it is able to
produce its own offspring(reproductive age). In a previous study [29], the
9
age structure within the Penna Populations was investigated. This study
was done to further investigate the observed cyclic pattern [30] and verify
the suggested periodicity [31] found within the population.
9
Age structure describes the distribution of a population at a particular time to its
component ages; that is the normalized number of individuals plotted with respect to age.
11
Chapter 3
Methodology
12
preset number of allowed mutations, th; 2)the individual reaches the preset
number of allowed mutations, th; 3)a newborn dies when a random number
generated between zero to one is greater than the Verhulst factor, V , is
applied.1
In this work, the Penna model is to be characterized on two regimes:
chaotic and non-chaotic. The populations, therefore, that will be under study
will be those having parameters (b, r, th) corresponding to population time
series that are chaotic or non-chaotic in nature2 . For this study, increasing the
parameter, b, for the most part will give us the chaotic regime of the Penna
populations. The population time series corresponding to each regime will
then be characterized by then reconstructing the time series and looking into
its predictability3 . Different methods/algorithms(see section 2.1) could be
used to make short-term predictions based on a library of patterns within
the series. In this case, the simplex projection method is employed.
13
termed strange attractors by Roulle and Takens [32]. They then went on to
conjecture that these strange attractors are the cause of turbulent behavior
in fluid flow [14].
SPM was then used for different applications. SPM was used to distin-
guish chaos from measurement error [11, 33], or from data with uncorrelated
noise in time series [34]. SPM was then used for resolution enhancement and
signal recovery of the Raman Spectra [35]. The SPM has a found number
of applications due to its ability to provide accurate short-term predictions
although the dynamics or the mathematical model of the time series is not
known apriori.
For this work, the SPM was used to reconstruct the population time series
generated by the Penna Model implementation as discussed in the preceding
section. The SPM was implemented [36] likewise: The time series generated
by the Penna model which is of 1000 data points will be divided into two
parts; the first 500 points will be the basis points in order for us to determine
a library of patterns which will then be used to predict the next 500 points.
The method, likewise, is discussed in stepwise fashion:
The basis points from the original time series is given by
from which the next 500 points will be predicted; that is, xi for 500 < i
1000 will then predicted.
Now that the basis points are established, an embedding dimension, e,
must be chosen. The embedding dimension determines the number of points
in e-space that will represent one data point in the real time series. For
example if e is set to 3, each point in the real space, xi , i = 1, 2, . . . , N, is a
14
3-component (e-component) or 3-dimensional point in e-space4 :
After which, a value for time delay or lag time, , must be chosen. As an
example, if is set to 1 with e=3, we have for eq. (3.2)
The step xi (i > N) to predict and the number of steps into the future,
tp , to use were chosen. For example, for N = 500 and we are to determine
x501 with tp = 2, the e-dimensional point
xe=3
499 = {x499 , x498 , x497 } (3.4)
xe=3
491 = {x491 , x490 , x489 } (3.5)
will be used.
The N points will then be plotted on the e-dimensional space. After
which, the nearest neighbors of the initial reference are determined. There
are e + 1 of them considered as nearest neighbors (excluding the reference, as
it is not its own neighbor). This is done while keeping track of its sequence
xen . One then has xen1 , xen2 , . . . , xene+1 as the nearest neighbors.
The mean of the components at reference plus tp is taken with only the
most forward component:
{xn1 +tp + xn2 +tp + . . . + xne+1 +tp }
xpred = (3.6)
e+1
where xpred is the predicted point. As mentioned earlier, for this implementa-
tion, xpred ranges from x501 to x1000 . From these predicted values the original
time series could now be reconstructed.
4
Note that for N elements of eq. (3.1), there are N e e-dimensional corresponding
points.
4
that is, points xe=3
499 and x491 for our previous examples
e=3
15
3.3 Penna and SPM parameter correlation
To be able to characterize the Penna model through the simplex pro-
jection method (SPM), a relationship between their parameters must be es-
tablished since their respective parameters would represent the model and
the method. For the Penna model, parameter sets are chosen so as to char-
acterize the model with increasing values of the said parameters. Also, we
are to chose the parameters corresponding to the non-chaotic and chaotic
regimes [30]. Here are the Penna parameters that we will consider:
1. reproductive age, r; variation of this parameter does not have any effect
on the chaosticity of the VB populations generated by the Penna model.
2. mutation threshold, th; It was shown from a previous study [30] that
for the VB implementation of the Penna model, an increase in th would
result to pronounced fluctuations within the population time series and
thus the suggested appearance of chaos.
16
With the chosen parameters, the generated population time series would
then be reconstructed using SPM. Reconstruction of the time series was done
over a certain range5 of the following SPM parameters:
1. embedding dimension, e
5
This is the scanning range, the range of values of e and from which the appropriate
e- pair is to be selected
17
Chapter 4
18
particular set of parameters: r, b and th. The reconstructed series which
yields the maximum correlation coefficient gives the set of appropriate e and
. The determination of the appropriate SPM parameters is done according
to which regime a population series belongs; those within the chaotic regime
and those within the non-chaotic regime of the Penna model.
For the Chaotic regime, representative data was taken and the summary
is given by tables 4.1 and 4.2. Table 4.1 lists representative data for vary-
ing b and th which when increased gives the chaotic regimes. Table 4.2,
on the other hand, lists representative data wrt the Penna parameter r (a
non-chaotic parameter, the chaosticity of the population is maintained by
19
0.03
0.025
0.02
rho2
0.015
0.01
0.005
15
0 30
10
20
10 5
0
embedding dimension, e lag time, t
Figure 4.1: Within the Chaotic regime: The appropriate embedding dimen-
sion, e = 27 and lag time, = 5 is given by the highest peak in the plot;
Penna Parameters: b = 4, r = 2 and th = 2.
0.06
0.05
0.04
rho2
0.03
0.02
0.01
0
30 25 14
20 10 12
15 6 8
10 5 2 4
embedding dimension, e lag time, t
Figure 4.2: Within the Chaotic regime: The appropriate embedding dimen-
sion, e = 26 and lag time, = 9; Penna Parameters: b = 1, r = 2 and
th = 6.
20
using b = 2).
r = 2, th = 2 e
b=2 21 9
b=3 3 5
b=4 27 5
b=5 3 10
b=6 14 12
b = 1, r = 2 e
th = 3 11 8
th = 4 30 9
th = 5 3 9
th = 6 26 9
th = 7 10 10
th = 8 3 14
21
Table 4.2: Representative data of the appropriate SPM parameter set: e and
within the chaotic regime (b = 2) for varying r values (th values were taken
at 2 and 8).
22
4.1.1 The Embedding Dimension, e, in relation to the
Penna Parameters: b, th and r
The embedding dimension, e, for varying birth rates, b
30
r=2, th=2
embedding dimension, e
25
20
15
10
0
2 3 4 5 6
birth rate, b
From figure 4.3, we see that the minimum and maximum values of the
appropriate embedding dimension occurs at birthrates, b = 3 and b = 4
respectively. The return maps for populations of r = 2, th = 2 with the b
values 3 and 4 are plotted. This as shown by figure 4.4.
For b = 3 where the minimum appropriate embedding dimension occurs
2
where each point represents a different Penna population with a unique set of param-
eters b, r and th
23
2
1.8 b=3
b=4
1.6
1.4
x(n+1) 1.2
0.8
0.6
0.6 0.8 1 1.2 1.4 1.6 1.8 2
xn
Figure 4.4: The first return map (z=1) of 100 points for populations r = 2,
th = 2 with different b values (3 and 4). The maximum population, K =
10000.
(e = 3), the return map is described by the squares in figure 4.4. It is seen
that the return map of b = 3 shows a cyclic pattern as compared to that of
b = 4 (described by asterisks, *) whose appropriate embedding dimension
is high(e = 30). This result is consistent with the conjecture that a higher
embedding dimension represents higher degree of complexity[36]. The cyclic
pattern suggests short term periodicity[29, 30] which could be viewed as a
form of order within the system such that although there is no single period
to be found, the period change follows an ordered pattern.
24
the maximum value of e occurs at th = 4 and the minimum e value is at
th = 5. The return map for the two populations is illustrated by figure 4.6.
From the return maps, as that of varying b values in figure 4.4, the one with
a smaller e value has a more ordered system since less spread occurs in its
return map. Here, the one with a low e value is the population with th = 5
(described by squares in the return map of figure 4.6) and the one with high
e value is the population with th = 4 (described by squares in the return
map of figure 4.6). The result, then, for varying th value is again consistent
with the conjecture that higher e value represents a more complex nature[36].
30
b=1, r=2
embedding dimension, e
25
20
15
10
0
3 4 5 6 7 8
mutation threshold, th
25
0.85
0.84
x(n+1)
0.83
0.82
th = 5
0.81 th = 4
0.8
0.8 0.81 0.82 0.83 0.84 0.85
x(n)
Figure 4.6: The first return map (z=1) of 100 points for populations r = 2,
b = 1 with different th values (4 and 5). The maximum population, K =
10000.
26
30
embedding dimension, e
25
20
15
10
5 b=2, th=2
0
2 4 6
reproductive age, r
27
1.4 r=1
r=2
1.2
xn+1
1
0.8
0.6
0.6 0.8 1 1.2 1.4
xn
Figure 4.8: The first return map (z=1) of 100 points for populations b = 2,
th = 2 with different r values (1 and 2). The maximum population, K =
10000.
30
embedding dimension, e
25
20
15
10
5 b=2, th=8
0
2 4 6
reproductive age, r
28
2
r=1
r=5
1.5
xn+1 1
0.5
0
0 0.5 1 1.5 2
xn
Figure 4.10: The first return map (z=1) of 100 points for populations r = 2,
th = 2 with different b values (3 and 4). The maximum population, K =
10000.
29
20 20
r=2, th=2 b=1, r=2
15 15
lag time, t
lag time, t
10 10
5 5
0 0
2 4 6 4 6 8
birth rate, b mutation threshold, th
Figure 4.11: (a)The appropriate lag time, , with respect to birth rate, b
(th = 2 and r = 2). (b)The appropriate lag time, , with respect to mutation
threshold, th (b = 1 and r = 2)
20 20
b=2, th=2 b=2, th=8
15 15
lag time, t
lag time, t
10 10
5 5
0 0
2 4 6 2 4 6
reproductive age, r reproductive age, r
30
4.1.3 Summary of Results for the Chaotic Regime
As a summary of results for the chaotic regime, refer to figures 4.13 and
4.14.
35 35
embedding dimension, e embedding dimension, e
30 lag time, t 30 lag time, t
SPM parameter
SPM parameter
25 25
20 20
15 15
10 10
5 5
0 0
2 3 4 5 6 3 4 5 6 7 8
birth rate, b mutation threshold, th
Figure 4.13: SPM parameters for varying Penna parameters. (a)e, (repre-
sented by circles and squares respectively) wrt b and (b) e, (represented
by circles and squares respectively) wrt th.
30 35
25 30 embedding dimension, e
lag time, t
SPM parameter
SPM parameter
25
20
20
15
15
10
10
5 embedding dimension, e
5
lag time, t
0 0
1 2 3 4 5 6 7 1 2 3 4 5 6 7
reproductive age, r reproductive age, r
31
The embedding dimension in relation to the Penna parameters:
1. embedding dimension wrt birth rate (e vs b); e varies largely for dif-
ferent b values. Result is consistent with the conjecture that higher e
represents greater complexity within the population as shown by the
return maps of population with varying b values.
4. the appropriate lag time, , for Penna populations within the chaotic
regime, seems to have a value close to 10, the short term period[29].
32
4.2 The Appropriate SPM Parameters in the
Non-Chaotic Regime
The Non-chaotic regimes of the Penna model are those with b = 1 and
low th values and with varying r values. Figures( 4.15- 4.16) illustrates the
choice of the appropriate SPM parameters: e and th for populations corre-
sponding to the non-chaotic regime. The highest peak of each plot gives the
corresponding appropriate e to pairs since it gives the best reconstruction;
i.e., it has the highest 2 . For the non-chaotic range, the scanning range for
the appropriate embedding dimension is maintained at a 3-30 range just like
that in the chaotic regime. For the appropriate lag time, a 1-14 range was
implemented as well.
0.4
0.35
0.3
0.25
rho2
0.2
0.15
0.1
0.05
0
30 15
20 10
10 5
0
embedding dimension, e lag time, t
Figure 4.15: 2 vs. embedding dimension, e, and lag time, ; within the
non-chaotic regime. The appropriate embedding dimension, e = 20 and lag
time, = 3; with Penna Parameters: b = 1, th = 1 and r = 5.
33
0.09
0.08
0.07
0.06
rho2 0.05
0.04
0.03
0.02
0.01
0
30 25 10 12 14
20 15 10 4 6 8
5 2
embedding dimension, e lag time, t
Table 4.3 gives the summary of the appropriate SPM parameters in the
non-chaotic regime.
The summary given by table 4.3 is then used for the correlation of the
appropriate SPM parameters: e and to the Penna parameter r.
34
Table 4.3: Summary of the appropriate SPM parameters: e and within the
non-chaotic regime (b = 1, th = 1 with varying r and b = 1, th = 2 with
varying r).
b = 1, th = 1 e
r=1 20 14
r=2 3 12
r=3 13 8
r=4 26 2
r=5 20 3
b = 1, th = 2 e
r=1 19 6
r=2 12 14
r=3 6 6
r=4 17 13
r=5 18 5
r=6 18 4
r=7 25 1
35
decreases as r increases.
30
embedding dimension, e
25
20
15
b=1, th=1
10
0
1 2 3 4 5
reproductive age, r
30
embedding dimension, e
25
20
15
b=1, th=2
10
0
1 2 3 4 5 6 7
reproductive age, r
36
1
r=2
0.9 r=4
x(n+1) 0.8
0.7
0.6
0.5
0.4
0.4 0.5 0.6 0.7 0.8 0.9 1
x(n)
Figure 4.19: The first return map (z=1) of 100 points for populations b = 1,
th = 1 with different r values (2 and 4). The maximum population, K =
10000.
20 20
b=1, th=1 b=1, th=2
15 15
lag time, t
lag time, t
10 10
5 5
0 0
2 4 2 4 6
reproductive age, r reproductive age, r
Figure 4.20: The appropriate lag time, for varying reproductive age, r
(b = 1). (a)th = 1; (b)th = 2.
37
4.2.2 Summary of Results for the Non-chaotic Regime
The embedding dimension for the non-chaotic regime varies minimally
30
embedding dimension, e
25
lag time, t
SPM parameter
20
15
10
0
1 2 3 4 5
reproductive age, r
30
25 embedding dimension, e
lag time, t
SPM parameter
20
15
10
0
1 2 3 4 5 6 7
reproductive age, r
2. The appropriate lag time, tau, are of low values and generally decreases
as r increases.
38
Chapter 5
Conclusion
The VB implementation of the Penna Model was used to generate the popula-
tion time series to be characterized. Different population time series defined
by different birth rate, b, reproductive age, r and mutation threshold, th
were generated. Since the Penna Model is to be characterized by the simplex
projection method, the population time series generated were reconstructed
using SPM. The characterization is done according to two regimes: chaotic
and non-chaotic. This is done by correlating the Penna parameters and the
SPM parameters with the Penna parameters defining the regime. That is,
the choice of the Penna parameters depends on the parameters effect on the
chaoticity or the lack of it in the population. High values of the Penna pa-
rameters: birth rate, b and mutation threshold, th yields the chaotic regime
of the Penna populations1 . The appropriate SPM parameters: embedding
dimension, e, and the lag time, are chosen with the use of a statistical
measure; in this case, the Pearson correlation coefficient. The square of this
coefficient is plotted over the SPM parameters, e and . The maximum 2
1
However, it was shown from some of the return maps that some populations, although
with (b > 2) values and high th, with certain combinations of b, th and r could have
periodocity and are therefore non-chaotic. The complexity map of the Penna model could
still be investigated for full mapping.
39
value has the coordinate value of the appropriate e and .
40
In summary, in the chaotic regime (i.e for populations defined by high b
and th), the appropriate e takes on varying values with respect to increasing
Penna parameters; i.e. the appropriate embedding dimension, e, exhibits no
general trend with respect to r, b and th. It was seen that the appropri-
ate embedding dimension, e, depends on the complexity of the population.
Higher dispersion in the population time series is deemed to represent greater
complexity since high population fluctuations is evidence of chaoticity. High
population fluctuation is translated to greater dispersion in terms of the
return maps of populations described by high embedding dimension when
reconstructed using SPM. Results are consistent with the conjecture that
high e values corresponds to greater complexity[36].
The lag time is the distance between points (within the basis points, i.e.,
the first half of time series of 1000 points) to be projected in the e-dimesional
space. The points in the e-dimesional space creates the simplex, or neighbor-
hood of points which are then used to reconstruct the next half of the series.
The appropriate lag time, , in the chaotic regime takes on values around
10. This is consistent with the short term period found due to component
ages[29]. This means that high correlation occurs when the appropriate lag
time used is 10. This is because the points used to predict the next part of
the series are separated by 10 timesteps and since the short term period is 10,
the points every 10th time steps are of generally the same value (in this case
this value is the distribution of component ages) and this is why higher cor-
relation is expected when the points used for prediction are generally related.
In the non-chaotic regime, return maps show single point attractors which
41
conforms with the non-chaotic regime since periodicity is present. The ap-
propriate embedding dimension, e, generally increases as r increases while
the appropriate lag time, generally decreases as r is increased.
42
Appendix A
Programming Details
Machine specifications:
Operating System: Linux (Ubuntu 6.06)
Machine epsilon: 1.08420217E-19
Compiler: GNU project Fortran 77 compiler
43
A.2 Distribution and Period of the Random
Number Generator
A discussion on the random number generator used in the population
generation could be found from one of the reference cited [30] with the code
for Penna Population generation. It was shown in the particular reference
that the points generated by the random number generator used (randclm)
has an even distribution via return map of the first 10000 points generated
from 0 to 1 (See figure A.1 [30])
1
0.9
0.8
0.7
0.6
n+1
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
n
Figure A.1: Return map of the points generated by the random number
generator used. 10000 points were used with nseed = 1.
From the same reference (Section B.2 of the Appendix): The random
number generator, randlcm() used in the program has a period of about 2
billion so that it can generate numbers between 0 and 1 this number of times
before it repeats itself of these values. If we are to check for how many times
randlcm() is called within the program per individual, we see that there are
only two where one is called in subroutine mutate() for the random death
44
procedure and the other one in subroutine mutate() for the mutation process.
A maximum of approximately 25000 individuals are alive in one timestep so
that randlcm() will be called 50000 times in one timestep. With the alloted
period for randlcm() a maximum of 40000 timesteps are allowed which is
more than enough for this study. Thus, the trouble of exhaustion of the
random number generator where it repeats the values that it generates will be
avoided.
45
* xval value where function is evaluated (?) *
* xvalar(), xvalar2() arrays of xval *
* xstep stepsize *
* tp timestep(s) into the future *
* nbasis *
* nextxt predicted value *
* *
* Last modified: 2007.03.21 IBO, SB *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
implicit none
character*1 choice
real*4 t(1000),x(1000),xb(1000)
real*4 xact(1000)
integer*4 xpred(1000)
real*4 xact2(1000), xpred2(1000)
real*4 xval
real*4 xvalar(1000), xvalar2(1000)
real*4 err(500)
real*4 sumAct,sumPred
real*4 countAct,countPred
real*4 meanAct,meanPred
real*4 diffAct(500),diffPred(500)
real*4 prodNum(500)
real*4 sumProd
real*4 sumActDen,sumPredDen
real*4 diffActDen(500),diffPredDen(500)
real*4 cc,num,den
real*4 xmax
real*4 xstep
integer*4 embdim
integer*4 tp,nahead
integer*4 nbasis
real*4 nextxt
integer*4 i,j,k
real*8 edimdist
external edimdist
read(1,*,end=100)x
!write(*,*)x
xmax = 1000
nahead = 500
46
xstep = xmax/999
do embdim = 3,20
do i = 1,1000
xval = (i-1)*xstep
xvalar(i) = xval
xact(i) = x(i)
end do
do i = 1,1000
xpred(i) = xact(i)
end do
call pcc(xpred,xact,nbasis,nahead,cc)
write(*,*)cc
end do
close(1)
100 continue
end !program sp.f
************************************************************************
subroutine nextval(tp,edim,n,x,nextx)
implicit none
integer*4 tp, edim, n !n is number of point so far
integer*4 n1 !n1 is index of chosen point
real*4 nextx !intent(out)
real*4 x(1000),dist(1000)
integer*4 indx(1000)
integer*4 j
real*8 edimdist
external edimdist
47
n1 = n-(tp-1) !N+1 is predicted by adding tp
!Calculate distances from n1
do j=1,n
dist(j) = edimdist(edim,x,n1,j)
end do
nextx = nextx/(edim+1)
************************************************************************
real*8 function edimdist(edim,x,n1,n2)
implicit none
integer*4 edim, n1, n2
real*4 x(1000)
real*4 sumsq
integer*4 i
integer*4 t
************************************************************************
subroutine indexx(n,arr,indx)
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Indexes an array arr(1:n), i.e., outputs the array indx(1:n) *
* such that arr(indx(j)) is in ascending order for j = 1, 2, . . ., N *
48
* The input quantities n and arr are not changed. *
* *
* From "NUMERICAL RECIPES IN FORTRAN77: *
* THE ART OF SCIENTIFIC COMPUTING" (ISBN 0-521-43064-X) *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
integer n,indx(n),M,NSTACK
real arr(n)
parameter (M=7,NSTACK=50)
integer i,indxt,ir,itemp,j,jstack,k,l,istack(NSTACK)
real a
do j=1,n
indx(j)=j
end do
jstack=0
l=1
ir=n
1 if (ir-l.lt.M) then
do j=l+1,ir
indxt=indx(j)
a=arr(indxt)
do i=j-1,l,-1
if (arr(indx(i)).le.a) goto 2
indx(i+1)=indx(i)
end do
i=l-1
2 indx(i+1)=indxt
end do
if (jstack.eq.0) return
ir=istack(jstack)
l=istack(jstack-1)
jstack=jstack-2
else
k=(l+ir)/2
itemp=indx(k)
indx(k)=indx(l+1)
indx(l+1)=itemp
if (arr(indx(l)).gt.arr(indx(ir))) then
itemp=indx(l)
indx(l)=indx(ir)
indx(ir)=itemp
end if
if (arr(indx(l+1)).gt.arr(indx(ir))) then
itemp=indx(l+1)
indx(l+1)=indx(ir)
49
indx(ir)=itemp
end if
if (arr(indx(l)).gt.arr(indx(l+1))) then
itemp=indx(l)
indx(l)=indx(l+1)
indx(l+1)=itemp
end if
i=l+1
j=ir
indxt=indx(l+1)
a=arr(indxt)
3 continue
i=i+1
if (arr(indx(i)).lt.a) goto 3
4 continue
j=j-1
if (arr(indx(j)).gt.a) goto 4
if (j.lt.i) goto 5
itemp=indx(i)
indx(i)=indx(j)
indx(j)=itemp
goto 3
5 indx(l+1)=indx(j)
indx(j)=indxt
jstack=jstack+2
if (jstack.gt.NSTACK) pause STACK too small in indexx
if (ir-i+1.ge.j-l) then
istack(jstack)=ir
istack(jstack-1)=i
ir=j-1
else
istack(jstack)=j-1
istack(jstack-1)=l
l=i
end if
end if
goto 1
end
************************************************************************
subroutine pcc(xpred,xact,nbasis,nahead,cc)
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Calculates the square of the Pearson correlation coefficient *
* to determine the error of the points predicted by the SPM from the *
* actual time series generated by the Penna model. *
50
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
implicit none
real*4 xact(1000)
integer*4 xpred(1000)
real*4 sumAct,sumPred
real*4 countAct,countPred
real*4 meanAct,meanPred
real*4 diffAct(500),diffPred(500)
real*4 prodNum(500)
real*4 sumProd
real*4 sumActDen,sumPredDen
real*4 diffActDen(500),diffPredDen(500)
real*4 cc,num,den
integer*4 tp,nahead
integer*4 nbasis
sumPred = 0.0
countPred = 0.0
do tp = 1,nahead
sumPred = sumPred + xpred(nbasis+tp)
countPred = countPred + 1
end do
meanPred = sumPred/countPred
sumAct = 0.0
countAct = 0.0
do tp = 1,nahead
sumAct = sumAct + xact(nbasis+tp)
countAct = countAct + 1
end do
meanAct = sumAct/countAct
do tp = 1,nahead
diffPred(tp) = xpred(nbasis+tp) - meanPred
diffAct(tp) = xact(nbasis+tp) - meanAct
prodNum(tp) = diffPred(tp)*diffAct(tp)
end do
num = 0.0
do tp = 1,nahead
num = num + prodNum(tp)
end do
sumPredDen = 0.0
sumActDen = 0.0
51
do tp = 1,nahead
diffPredDen(tp) = (xpred(nbasis+tp) - meanPred)**2
diffActDen(tp) = (xAct(nbasis+tp) - meanAct)**2
sumPredDen = sumPredDen + diffPredDen(tp)
sumActDen = sumActDen + diffActDen(tp)
end do
den = SQRT(sumActDen*sumPredDen)
cc = (num/den)**2
end
52
Appendix B
Effect of Randomization on
Penna populations and on the
appropriate SPM parameters
obtained
53
a high embedding dimension value represents greater complexity, the greater
the dispersion is in the return maps the more complex the system is. Since
both population with different seed values fluctuates with the same amount
over the same mean value, we expect that their embedding dimension must
not vary largely. For the populations described by figure B.1, the appro-
priate embedding dimension found are the same, e = 3. For the lag time
of both population, appropriate varies minimally; for the population with
seed = 1, = 5 while that of seed = 2 has = 7. This variation is small
enough that we can say that the randomization more or less has no effect on
the appropriate SPM parameters obtained.
20000
seed = 1
seed=2
15000
Population, N(t)
10000
5000
0
0 200 400 600 800 1000
time, t
54
Bibliography
[3] E. E. Peters Chaos and order in the capital markets John Wiley and
Sons, (1991).
[6] C. Alexander and I. Giblin Creating Order Out of Chaos RISK Maga-
zine, 7,6:71-76, (1994).
55
[9] M. Casdagli Chaos and deterministic vs stochastic non-linear modelling
Journal of the Royal Statistical Society, Series B, 54: 303-328, (1992).
[17] K. Malarz. Searching for scaling in the penna bit-string model of bio-
logical aging. International Journal of Modern Physics C, 2000.
56
[19] J. S. Sa Martins and S. Moss de Oliveira. Why Sex? - Monte Carlo
Simulations of Survival After catastrophes. International Journal of
Modern Physics C, 9:421-432, 1998.
[21] M. He and J. Lin, H. Jiang and X. Liu. The Two Populations Cellular
Automata Model with Predation Based on the Penna Model. Physica
A, 312: 243-250, September 2002.
[26] J. B. Cohen How Many People Can the Earth Support? W.W. Norton
and Co., New York, 1995.
57
[28] J. B. Coe and Y. Mao. Solvable Senescence Model Showing a Mortality
Plateau. Phys. Rev. E 89, 2002.
[34] D. Holton and R. May. The Nature of Chaos, ed. by T. Mullin. Oxford
University Press, USA, 1993.
58