Download as pdf or txt
Download as pdf or txt
You are on page 1of 63

CHARACTERIZATION OF THE PENNA MODEL BY

SIMPLEX PROJECTION METHOD

by

STEPHANIE BANARES IBO

An Undergraduate Thesis Submitted to the


National Institute of Physics

College of Science
University of the Philippines
Diliman, Quezon City

As Partial Fulfillment of the Requirements


for the Degree of
Bachelor of Science in Applied Physics

April 2007
CERTIFICATION
This is to certify that this undergraduate thesis entitled, Characteriza-
tion of the Penna Model by Simplex Projection Method and submitted
by Stephanie Banares Ibo to fulfill part of the requirements for the degree
of Bachelor of Science in Physics was successfully defended and approved on
March 27 2007.

RONALD S. BANZON, Ph.D.


Assistant Professor of Physics
National Institute of Physics
College of Science
U.P. Diliman, Quezon City

JOHNROB BANTANG, Ph. D BHAZEL ANNE RARA, M.Sc.


National Institute of Physics National Institute of Physics
College of Science College of Science
U.P. Diliman, Quezon City U.P. Diliman, Quezon City

The National Institute of Physics endorses acceptance of this undergradu-


ate thesis as partial fulfillment of the requirements for the degree of Bachelor
of Science in Physics.

ARNEL A. SALVADOR, Ph.D.


Director
National Institute of Physics

This undergraduate thesis is hereby officially accepted as partial fulfill-


ment of the requirements for the degree of Bachelor of Science in Physics.

CAESAR A. SALOMA, Ph.D.


Dean, College of Science

i
ABSTRACT

CHARACTERIZATION OF THE PENNA MODEL BY


SIMPLEX PROJECTION METHOD

Stephanie Banares Ibo Adviser:


University of the Philippines, 2007 Ronald S. Banzon, Ph.D.

The Penna model is to be characterized by series reconstruction using


the simplex projection method (SPM). The main task is to determine at each
regime, chaotic or non-chaotic, how SPM best represents the Penna model.
The appropriate simplex projection parameters, e (embedding dimension)
and (lagtime) are determined by pairs within the complexity map of the
Penna parameter space using statistical measure, i.e, the Pearson correlation
coefficient. For each regime, the appropriate SPM parameter sets are corre-
lated to that of the Penna parameters (birth rate, b; mutation threshold, th
and reproductive age, r). The behavior of the appropriate SPM parameters
(e - pairs) with respect to varying Penna parameters is observed within
the chaotic and non-chaotic regimes. For the chaotic regime, e is a weak
function of b, th and r. Lag time, , on the other hand, generally has a
value close to 10, the short term period due to component ages found from
a recent study[29]. Results in this regime is consistent with the conjecture
that a high e value means greater complexity within the population to which
SPM was applied[36]. In the non-chaotic regime, correlation is done only
between the Penna parameter, r and the SPM parameters: e and . It was
found that the appropriate embedding dimension, e, increases while the lag
time, , decreases as r increases.

PACS: 87.23.Cc [Population dynamics and ecological pattern formation],


87.23.Kg [Dynamics of evolution], 89.75.-k Complex systems

ii
Table of Contents

Abstract ii

1 Introduction 1

2 Theory and Review of Related Literature 3


2.1 Chaotic Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 The Penna Model . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Measures of Chaos within the Penna Model . . . . . . . . . . 9
2.4 Periodicity within the Penna Model through Age demographics 11

3 Methodology 12
3.1 Penna Model Implementation . . . . . . . . . . . . . . . . . . 12
3.2 The Simplex Projection Method . . . . . . . . . . . . . . . . . 13
3.3 Penna and SPM parameter correlation . . . . . . . . . . . . . 16

4 Results and Discussion 18


4.1 The Appropriate SPM Parameters in the Chaotic Regime . . . 19
4.1.1 The Embedding Dimension, e, in relation to the Penna
Parameters: b, th and r . . . . . . . . . . . . . . . . . 23
4.1.2 The Lag Time, , in relation to the Penna Parameters:
b, th and r . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.3 Summary of Results for the Chaotic Regime . . . . . . 31
4.2 The Appropriate SPM Parameters in the Non-Chaotic Regime 33
4.2.1 The Embedding Dimension, e, and the lag time, in
relation to the Penna Parameter: r . . . . . . . . . . . 35
4.2.2 Summary of Results for the Non-chaotic Regime . . . . 38

5 Conclusion 39

iii
A Programming Details 43
A.1 Independent Penna Population Generation . . . . . . . . . . . 43
A.2 Distribution and Period of the Random Number Generator . . 44
A.3 Simplex Projection and Statistical Evaluation . . . . . . . . . 45

B Effect of Randomization on Penna populations and on the


appropriate SPM parameters obtained 53

iv
Chapter 1

Introduction

The Penna model is a bit handling technique that simulates ageing[1].


The model is based on the mutation accumulation theory. This theory of
ageing states that for the most part of the adult period, the intensity of nat-
ural selection maintaining viability, survival and fertility must decline with
age[2]. Thus, at old age, selection will be less able to prevent harmful effects
brought about by new mutations. These new mutations have adverse effects
in the health of an individual; possibly leading to death depending upon the
threshold of that individual. The Penna model is presently considered as the
most successful computational model for age-structured populations as it is
able to reproduce data from population series found in nature.

The simplex projection method(SPM) is a trajectory reconstruction based


on the method of delay, wherein a multidimensional signal is obtained from
a scalar time series. The merit of this method is that it requires no prior
information about the system to which it is applied to. It has been used in
a wide range of applications: in optics, signal processing and even to biolog-
ical populations with real time data series. It has been used to reconstruct
population time series generated by the Penna model.

1
In this work, the simplex projection method is used to characterize the
Penna model. This is done with the use of the parameters that represent the
method of reconstruction (SPM) and the model itself(Penna). The parame-
ters of the method and the model are correlated. The SPM mainly has two
parameters: embedding dimension, e and time delay or lag time, while the
Penna model has three primary parameters (birth rate, b; mutation thresh-
old, th and reproductive age, r) which represent the population generated.
the SPM parameters are chosen such that it gives the most accurate recon-
struction.

Owing to the complexity of the model itself, chaotic regimes occur within
the population generated by the Penna model. These chaotic regimes depend
on the Penna parameters that defines the population. With this idea, the
correlation between the Penna and the SPM parametes could be done in two
regimes: chaotic and non-chaotic. The behavior of the SPM parameters are
observed as the Penna parameters are varied corresponding to each regime
since variation of the Penna parameters causes the chaoticity within the
population.

2
Chapter 2

Theory and Review of Related


Literature

2.1 Chaotic Dynamics


Chaotic dynamics is a vital key to providing good forecasts on many
of todays most interesting fields especially on economics. Forecasts from
mathematical models based on chaotic dynamics is an advancing high-tech
research due to chaos in capital markets. Opinions vary whether these math-
ematical models could identify underlying chaotic system within financial
time series and thus provide good predictions that would exploit consequent
investment opportunities. There are those who claims that they have found
chaos in financial markets [3] while others who claim otherwise [4] and there
are simply those that are undecided [5].
Many algorithms [6] were devised in order to perform such forecasts. First
among these algorithms is based on the derivatives of the correlation expo-
nent algorithm devised by Grassberger and Procaccia [7]. Then, Casdagli
pioneered the use of radial basis functions which have many in common with
the neural networks but runs in a much shorter amount of time [8]. This
work has given rise to a new and more efficient type of algorithm that can

3
distinguish chaos from random behavior in time series of 10d , where d is the
dimension of the attractor. Hence, with this new algorithm, low dimensional
chaotic systems may be detected with only the smallest amountof data, say
1000 data points. It is with this argument that we can justify the use of a
relatively short time series (of 1000 data points) to provide accurate forecasts.
These new algorithms provides accurate short-term forecasts if the time
series is found to be chaotic, and are therefore loosely termed time delay
prediction methods. The method of forecasting varies: in Casdagli [9] the
forecasts are based on parametric linear regression; Nychka et. al. [10] use
nonparametric regression to find consistent estimates of the Liapunov expo-
nents; Sugihara and May [11] suggest a non-parametric simplex projection
method ; Alexander and Giblin [12] modified the Sugihara and May algo-
rithm to use barycentric coordinates. In all of these methods, if the system
is chaotic the correlation between actual and forecasted values will decline
as the number of points to predict increases, and very short term forecasts
will be quite accurate. On the other hand, if the system is purely random no
such decrease in prediction accuracy will be evident, and for many financial
returns series even the one step ahead predictions will be uncorrelated with
the actual returns.

Chaos measures from a time series

There a number of ways that can be done to measure the degree of chaos
within non-linear systems. Such methods include Poincare sections, Lya-
punov Characteristic equations (LCE), and fractal dimensions [13]. However,
the problem of determining from experimental measures quantities such as
LCEs or fractal dimensions is entirely different from measuring these quan-

4
tities in a mathematical (numerical) investigation. Fortunately, there is a
partial answer to this problem that has been applied successfully to a large
number of experimental investigations [14, 15]. The key idea is to replace
the phase space trajectory such as:

x(t) = [x1 (t), x2 (t), . . . , xn (t)] (2.1)

by an artificial phase space given by

y(t) = [y(t), y(t + t), . . . , y(t + mt)] (2.2)

where y(t) is any one of the phase space variables xi (t) or a functional
combination of these variables. Thus from a set of measurement of a single
quantity, y(t), a sequence of points such as follows can be constructed in the
artificial phase space:

x(t) = [y(t), y(t + t), . . . , y(t + mt)] (2.3)

x(t + t) = [y(t + t), . . . , y(t + (m + 1)t)] (2.4)


..
.
There are two concerns upon the use of this method; that is, the choice
of t and m. If t is too small then the y(t), y(t + t), . . . are not lin-
ealy independent and thus makes the reconstruced trajectory in the artificial
phase space non-linear. Otherwise, if t is too large , i.e., much larger than
the information decay time, then there is no dynamical relation between the
points. The second issue concerns with the choice of m. Usually the cor-
relation dimension, dc orrs is computed for a series of values of m=1,2,. . . .
The correlation dimension is directly proportional to m. However, once the

5
dimension of the artificial phase space is large enough the correlation dimen-
sion saturates and becomes constant. The minimum value of m for which
the correlation dimension saturates and becomes constant becomes mo and
we have e = mo + 1 which is referred to the embedding dimension, e. The
quantity, e, represents the minimum dimensionality of the artificial phase
space necessary to include the attractor.
The general idea discussed concerning phase space trajectory reconstruc-
tion has been developed to provide accurate trajectory reconstruction. In
this work, the method by Sugihara and May was implemented; i.e., the Sim-
plex Projection method. Within this method, the choice of m and t is
translated to the choice of the SPM parameters: e, embedding dimension
and the lag time, .

2.2 The Penna Model


The Penna Model is a population model that simulates aging by bit han-
dling techniques. This model was first introduced by Thaddeus Penna in
1995 [1]. The model has its biological justification from one of the major
aging theories, the mutation accumulation theory. This theory states that
mutations affecting old ages [2] accumulate due to a weaker reproductive
selection. The Penna model is presently considered as the most successful
computational model for age-structured populations [16] and has been ap-
plied to different natural time series of different species [17].
The Penna model was originally implemented to represent a single specie,
non-immigrating and asexually reproducing population. Now, different stud-
ies has extended the application of the model to sexually reproducing pop-
ulations [18, 19], to lattices [20], interspecies interaction [21], cellular au-
tomata [21] among others. Still, the model has its most powerful application

6
on simulating population ageing; owing this to the models simplicity yet
undoubted crucial role in representing1 the genome of an individual from a
given specie population.
An individuals genome is represented by a bit string2 which could have
a value of 1 or 0. A healthy gene would be represented by 0 and a bad or
mutated gene would, on the other hand, by 1. An individual primarily has a
set of healthy bit string; that is a bit string comprised by zeroes. There are
several important parameters that define a Penna population:

1. reproductive age, r; that is the age at which an individual may start


giving birth.

2. birth rate, b; the preset number of offspring at each reproduction.

3. mutation threshold, th; this is the number of allowed mutation3 before


an individual dies; 1 th 84 .

4. the Verhulst factor; the fraction of K (carrying capacity5 ) still available


for population growth:

V = 1 N(t)/K (2.5)

The Verhulst factor takes into account competition for resources such
as space and food which rooted from the concept of the carrying capacity,
1
with computer bit strings(for which the original model used 32 bits)
2
In this work, 8-bit string implementation was used.
3
In most works, only deleterious or harmful mutations [22, 23] are considered since
these mutations occur more frequently in nature than those of positive or beneficial mu-
tations [24, 25]
4
having th = 8 is the same as eliminating the effect of mutation threshold since the
maximum lifespan is 8 which is equal to the bit string length used.
5
the sustainable population size or the maximum number of individuals that an envi-
ronment can support for a long time [26]

7
K; that there are finite resources within environment which would call for
competition within the population.
This equation also takes into account competition for space and food
among other resources. The Verhulst factor is implemented in the Penna
model with the use of the random death procedure. During each time step,
a random number between zero and one is generated and will be compared
with V . If it is greater than V , the individual dies independently of the
age and genome. Otherwise, it continues. This procedure is what has been
termed as the random death procedure.
The need for the random death procedure through the implementation
of the Verhulst factor stems from the limited capabilities of different ageing
models including the Penna model to represent the complex nature of the
life and death cycles in different populations found in nature [27]. Although
the random death procedure is a vital part of the Penna model, there is no
biological justification for the procedure. This is so because the population
generated with such an implementation has for all its individuals an equal
survival probability. Simply put, all individuals, however fit, will die with
equal probability; this is not observed in real systems. It is with this argu-
ment that Martins and Cebrat [27] based their suggested modification on the
random death procedure.
The modified random death procedure is that the Verhulst factor will
be applied only to the newborns. This takes into account the ability of
older individual to adapt to their environment. This implementation, also,
is supported by the theory of natural selection wherein fitter individuals
has higher probability at survival6 . It is with this modification that the
limiting process of the random death procedure through the Verhulst factor
6
Some studies have shown that population saturation occurs at advanced ages by in-
troducing a Fermi survival function [28]

8
is maintained. The populations generated with this modified implementation
are termed as the VB populations while the original implementation is termed
as the VA implementation. In this work, the VB implementation is utilized.

2.3 Measures of Chaos within the Penna Model


Return Maps

Return maps [30] are graphical tools used to illustrate the attractors of
a dynamical system. It is obtained by plotting the xn vs. xn+z values where
z = 1, 2, 3...7 for a set of data such as that of the population through time
generated by the Penna Model. Through return maps, the attractors will be
seen and appreciated easily. Chaos is represented by multiple attractors/fixed
points as shown by xn vs xn+1 plots (first return maps, z=1).
In the Penna model Return maps were able to show the stability of the
system through varying parameters. For example from a study done, it was
seen that as the value of the birth rate b is increased8 , high fluctuations
occur thus suggesting chaotic regimes for that population. This is as verified
by the return map shown in figure( 2.1). For b=1, we see that there is a
single attractor and thus the system is stable. For b=6, there are multiple
attractors and thus, suggests a chaotic regime.
In the same study, it was shown that for high threshold values, high
fluctuations occur and thus suggests chaotic regimes. This was also verified
by the return map shown by figure ( 2.2). At th=8, multiple attractors occur
suggesting a chotic regime.
7
The first return maps for which z = 1 is enough to be able to display the attractors
of the system and thus measure the stability of the system
8
even at the value of b=2, fluctuations become prominent

9
2.2

1.8

1.6
x(n+1)

1.4

1.2

0.8

0.6
0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
x(n)

Figure 2.1: Return maps for the 8-bit VB population. Parameters are b = 1, 6
(* and squares respectively), r = 2, th = 2.[30]

1.1

1.05

0.95
x(n+1)

0.9

0.85

0.8

0.75

0.7

0.65
0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1
x(n)

Figure 2.2: Return maps for the 8-bit VB population with a high th. Pa-
rameters are b = 1, r = 2, th = 1, 8 (solid squares and empty circles
respectively).[30]

10
2.4 Periodicity within the Penna Model through
Age demographics
In population models, demography has been an important field of study
due to its effect on the growth and decline of the populations. In particular,
the Penna Model is a population model based on the theory of senescence
wherein at advanced ages, deleterious mutations are more likely to mani-
fest. Such mutations has adverse effects later in life upon an individual and
therefore on a populations survival and fertility [2]. It is with this idea that
age structure becomes an important feature of demography. Gompertz law
of exponential increase in mortality demonstrates an exponential decrease of
the population with respect to the populations component ages. An impor-
tant demographic feature related to age structure in population models is
the generation time, the time between birth and the time that it is able to
produce its own offspring(reproductive age). In a previous study [29], the
9
age structure within the Penna Populations was investigated. This study
was done to further investigate the observed cyclic pattern [30] and verify
the suggested periodicity [31] found within the population.

9
Age structure describes the distribution of a population at a particular time to its
component ages; that is the normalized number of individuals plotted with respect to age.

11
Chapter 3

Methodology

3.1 Penna Model Implementation


An 8-bit Penna model is utilized to generate a population of non-migrating
and asexually reproducing single species of organisms. Each individuals
genome from this population is represented by a bit string of length 8. Each
bit has a default value of 0 which represents a healthy or normal gene. When
mutation occurs, a normal gene represented by bit 0 is flipped to a value of
1.
A bit is read per time step of the simulation, each time step then corre-
sponding to a year an individual lives. Since the implementation uses 8-bit
strings to represent an individuals genome, the said individual has a max-
imum lifetime that spans eight years. An individual starts reproducing a
preset number of offspring, b, per time step at the time it reaches a pre-
set reproductive age, r. The offspring is an exact copy of the parent until
a random mutation at birth is implemented. This random mutation could
cause the newborn to acquire a bad mutated gene which would deviate the
newborns set of genes from the parentss.
Death occurs from either of the three causes: 1) the individual reaches a

12
preset number of allowed mutations, th; 2)the individual reaches the preset
number of allowed mutations, th; 3)a newborn dies when a random number
generated between zero to one is greater than the Verhulst factor, V , is
applied.1
In this work, the Penna model is to be characterized on two regimes:
chaotic and non-chaotic. The populations, therefore, that will be under study
will be those having parameters (b, r, th) corresponding to population time
series that are chaotic or non-chaotic in nature2 . For this study, increasing the
parameter, b, for the most part will give us the chaotic regime of the Penna
populations. The population time series corresponding to each regime will
then be characterized by then reconstructing the time series and looking into
its predictability3 . Different methods/algorithms(see section 2.1) could be
used to make short-term predictions based on a library of patterns within
the series. In this case, the simplex projection method is employed.

3.2 The Simplex Projection Method


The simplex projection method is a basic time delay method of prediction
which was primarily used to identify chaotic series. The method is based on
a theorem of Takens [15] which shows that it is possible to estimate the
dimension of any chaotic attractor by embedding the time series in an e-
dimensional space. These attractors with chaotic or turbulent behavior were
1
Since the Penna implementation used here is that of the VB implementation, the
Verhulst factor (equation( 2.5)) is applied only to newborns; this implementation of the
Penna Model was derived from previous studies that were also used recently [30, 29] as
was discussed in section 2.2
2
From a recent study [30], it was shown that an increase in birth rate, b, resulted
to fluctuations which suggests the chaotic regime within the dynamics of the population
system
3
that is the goodness of fit of the reconstructed time series with the original time series
generated by the Penna model

13
termed strange attractors by Roulle and Takens [32]. They then went on to
conjecture that these strange attractors are the cause of turbulent behavior
in fluid flow [14].
SPM was then used for different applications. SPM was used to distin-
guish chaos from measurement error [11, 33], or from data with uncorrelated
noise in time series [34]. SPM was then used for resolution enhancement and
signal recovery of the Raman Spectra [35]. The SPM has a found number
of applications due to its ability to provide accurate short-term predictions
although the dynamics or the mathematical model of the time series is not
known apriori.
For this work, the SPM was used to reconstruct the population time series
generated by the Penna Model implementation as discussed in the preceding
section. The SPM was implemented [36] likewise: The time series generated
by the Penna model which is of 1000 data points will be divided into two
parts; the first 500 points will be the basis points in order for us to determine
a library of patterns which will then be used to predict the next 500 points.
The method, likewise, is discussed in stepwise fashion:
The basis points from the original time series is given by

xi = {x1 , x2 , x3 , . . . , x500 } (3.1)

from which the next 500 points will be predicted; that is, xi for 500 < i
1000 will then predicted.
Now that the basis points are established, an embedding dimension, e,
must be chosen. The embedding dimension determines the number of points
in e-space that will represent one data point in the real time series. For
example if e is set to 3, each point in the real space, xi , i = 1, 2, . . . , N, is a

14
3-component (e-component) or 3-dimensional point in e-space4 :

xei = {xi , xi , . . . , xi(e1) } (3.2)

After which, a value for time delay or lag time, , must be chosen. As an
example, if is set to 1 with e=3, we have for eq. (3.2)

x3i = {xi , xi1 , xi2 } (3.3)

The step xi (i > N) to predict and the number of steps into the future,
tp , to use were chosen. For example, for N = 500 and we are to determine
x501 with tp = 2, the e-dimensional point

xe=3
499 = {x499 , x498 , x497 } (3.4)

was used to determine x501 . Likewise if tp = 10,

xe=3
491 = {x491 , x490 , x489 } (3.5)

will be used.
The N points will then be plotted on the e-dimensional space. After
which, the nearest neighbors of the initial reference are determined. There
are e + 1 of them considered as nearest neighbors (excluding the reference, as
it is not its own neighbor). This is done while keeping track of its sequence
xen . One then has xen1 , xen2 , . . . , xene+1 as the nearest neighbors.
The mean of the components at reference plus tp is taken with only the
most forward component:
{xn1 +tp + xn2 +tp + . . . + xne+1 +tp }
xpred = (3.6)
e+1
where xpred is the predicted point. As mentioned earlier, for this implementa-
tion, xpred ranges from x501 to x1000 . From these predicted values the original
time series could now be reconstructed.
4
Note that for N elements of eq. (3.1), there are N e e-dimensional corresponding
points.
4
that is, points xe=3
499 and x491 for our previous examples
e=3

15
3.3 Penna and SPM parameter correlation
To be able to characterize the Penna model through the simplex pro-
jection method (SPM), a relationship between their parameters must be es-
tablished since their respective parameters would represent the model and
the method. For the Penna model, parameter sets are chosen so as to char-
acterize the model with increasing values of the said parameters. Also, we
are to chose the parameters corresponding to the non-chaotic and chaotic
regimes [30]. Here are the Penna parameters that we will consider:

1. reproductive age, r; variation of this parameter does not have any effect
on the chaosticity of the VB populations generated by the Penna model.

2. mutation threshold, th; It was shown from a previous study [30] that
for the VB implementation of the Penna model, an increase in th would
result to pronounced fluctuations within the population time series and
thus the suggested appearance of chaos.

3. birth rate, b; as with the VA population, an increase in b would cause


fluctuations within the population time series and thus the appearance
of chaos within the system.

16
With the chosen parameters, the generated population time series would
then be reconstructed using SPM. Reconstruction of the time series was done
over a certain range5 of the following SPM parameters:

1. embedding dimension, e

2. time delay or lag time,

To determine the appropriate SPM parameters, e and in each system


and for those systems corresponding to both non-chaotic and chaotic regimes,
a statistical measure must be done. It was shown [36]that the use of the
square of the Pearson correlation coefficient, 2 , gives definitive results so this
statistical measure was employed in our analysis. The Pearson correlation
coefficient is given by:
n
X
(xi x)(yi y)
(x, y) =  n i=1
n 1/2
2 2
X X
(xi x) (yi y)
i=1 i=1

where xi corresponds to the actual population values while yi gives the


predicted values from the SPM. Their corresponding mean values are given
by x and y respectively.
2 is then plotted as a function of the SPM parameters: e and . The
SPM parameters corresponding to the highest value of the 2 gives the ap-
propriate parameter set within that population series reconstructed. Once
the appropriate SPM parameter sets (a pair of e and taken together) are
determined for each regime, they could then be examined for correlation with
the Penna parameters.

5
This is the scanning range, the range of values of e and from which the appropriate
e- pair is to be selected

17
Chapter 4

Results and Discussion

To be able to characterize the Penna populations by simplex projection


method (SPM), the first task is to determine the appropriate SPM param-
eters:(1) the embedding dimension, e, being the primary parameter, along
with(2) the lag time, . Thus, the appropriate SPM parameters are always
taken in pairs. Each pair of e and gives a reconstruction of the Penna
population time series which has a specific set of parameters (birth rate, b;
mutation threshold, th and reproductive age, r). Each reconstructed series is
compared to that of the original time series through the Pearson correlation
coefficient. The set of e and which gives the highest correlation coefficient
gives the appropriate SPM parameters.

The embedding dimension, e, is varied from a range of 3 to 301 . Along


with e, is varied at a range of 1 to 14. The use of this range is from the
possible effect of the short term periodicity found from a previous study [29]
also discussed earlier in section( 2.4). Thus, there are 30 14 reconstructed
series, each with a different pair of e and , for a Penna population with a
1
This range was found to be the range where the appropriate embedding dimension
occurs for VB populations( [36])

18
particular set of parameters: r, b and th. The reconstructed series which
yields the maximum correlation coefficient gives the set of appropriate e and
. The determination of the appropriate SPM parameters is done according
to which regime a population series belongs; those within the chaotic regime
and those within the non-chaotic regime of the Penna model.

4.1 The Appropriate SPM Parameters in the


Chaotic Regime
To be able to get a clearer picture how the appropriate SPM parameter
is determined, the Pearson correlation coefficient, is squared and plotted
against the SPM parameter space; i.e., against the embedding dimension, e
and the lag time, . The maximum value of 2 against e and gives the
appropriate SPM parameter set. So, for a set of Penna parameters corre-
sponding to a chaotic regime, the appropriate SPM parameter set is a pair of
e and given by the highest peak in the 2 vs e and plot. Figures 4.1 and
4.2 illustrates the selection of the appropriate SPM parameter sets within the
chaotic regime. Figure 4.1 corresponds to a chaotic system because a VB
population with a b value greater than 1 (i.e b = 4) gives a chaotic regime of
the model [30]. Figure 4.2 also corresponds to a chaotic regime since it has
a high th value [30] (in this case, th = 6).

For the Chaotic regime, representative data was taken and the summary
is given by tables 4.1 and 4.2. Table 4.1 lists representative data for vary-
ing b and th which when increased gives the chaotic regimes. Table 4.2,
on the other hand, lists representative data wrt the Penna parameter r (a
non-chaotic parameter, the chaosticity of the population is maintained by

19
0.03

0.025

0.02
rho2

0.015

0.01

0.005
15
0 30
10
20
10 5

0
embedding dimension, e lag time, t

Figure 4.1: Within the Chaotic regime: The appropriate embedding dimen-
sion, e = 27 and lag time, = 5 is given by the highest peak in the plot;
Penna Parameters: b = 4, r = 2 and th = 2.

0.06

0.05

0.04
rho2

0.03

0.02

0.01

0
30 25 14
20 10 12
15 6 8
10 5 2 4
embedding dimension, e lag time, t

Figure 4.2: Within the Chaotic regime: The appropriate embedding dimen-
sion, e = 26 and lag time, = 9; Penna Parameters: b = 1, r = 2 and
th = 6.

20
using b = 2).

Table 4.1: Summary (representative data) of the appropriate SPM parameter


set: e and within the chaotic regime (r = 2, th = 2 with varying b = 2 6
and r = 2, b = 1 with varying th = 3 7).

r = 2, th = 2 e
b=2 21 9
b=3 3 5
b=4 27 5
b=5 3 10
b=6 14 12
b = 1, r = 2 e
th = 3 11 8
th = 4 30 9
th = 5 3 9
th = 6 26 9
th = 7 10 10
th = 8 3 14

With a set of representative data, characterization of the Penna model


within the Chaotic regime is done by correlating the penna parameters with
its appropriate SPM parameter sets.

SPM and Penna parameter correlation

With the appropriate SPM parameters (e to pair) for Penna parameter


sets corresponding to chaotic regimes, we first look at how the appropriate
SPM parameters vary when a single Penna parameter is varied. In this sec-
tion, we consider variation of the three Penna parameters: b, th and r. First,

21
Table 4.2: Representative data of the appropriate SPM parameter set: e and
within the chaotic regime (b = 2) for varying r values (th values were taken
at 2 and 8).

SPM parameters for varying r


b = 2, th = 2 e b = 2, th = 8 e
r=1 4 10 r=1 3 1
r=2 30 9 r=2 3 9
r=3 8 8 r=3 3 10
r=4 18 9 r=4 11 10
r=5 28 10 r=5 30 10
r=6 11 2 r=6 30 10
r=7 25 5 r=7 25 5

we look into the behavior of the appropriate embedding dimension as a func-


tion of the Penna parameters: b and th since an increase in these parameters
gives us the chaotic regimes. Then, the lag time is observed as well, under
varying Penna parameters: b and th. The behavior of the SPM parameters(e
and ) is also observed under varying r (a non-chaotic parameter) values. So
as to be consistent with the chaotic regime, the b values is kept greater than
1 when r is varied.

22
4.1.1 The Embedding Dimension, e, in relation to the
Penna Parameters: b, th and r
The embedding dimension, e, for varying birth rates, b

The embedding dmension is first correlated with the Penna parameter,


b. The appropriate embedding, e, varies largely for increasing values of the
birth rate, b.(see figure 4.3). To be able to explain this behavior, the re-
turn maps at different points2 where the appropriate e value is small and large

30
r=2, th=2
embedding dimension, e

25

20

15

10

0
2 3 4 5 6
birth rate, b

Figure 4.3: The appropriate embedding dimension, e, with respect to birth


rate, b (th = 2 and r = 2).

From figure 4.3, we see that the minimum and maximum values of the
appropriate embedding dimension occurs at birthrates, b = 3 and b = 4
respectively. The return maps for populations of r = 2, th = 2 with the b
values 3 and 4 are plotted. This as shown by figure 4.4.
For b = 3 where the minimum appropriate embedding dimension occurs
2
where each point represents a different Penna population with a unique set of param-
eters b, r and th

23
2

1.8 b=3
b=4
1.6

1.4

x(n+1) 1.2

0.8

0.6
0.6 0.8 1 1.2 1.4 1.6 1.8 2
xn

Figure 4.4: The first return map (z=1) of 100 points for populations r = 2,
th = 2 with different b values (3 and 4). The maximum population, K =
10000.

(e = 3), the return map is described by the squares in figure 4.4. It is seen
that the return map of b = 3 shows a cyclic pattern as compared to that of
b = 4 (described by asterisks, *) whose appropriate embedding dimension
is high(e = 30). This result is consistent with the conjecture that a higher
embedding dimension represents higher degree of complexity[36]. The cyclic
pattern suggests short term periodicity[29, 30] which could be viewed as a
form of order within the system such that although there is no single period
to be found, the period change follows an ordered pattern.

The embedding dimension, e, for varying mutation threshold, th

Now, the embedding dimension is correlated with the mutation threshold.


The appropriate e values varies as th is increased (see figure 4.5). And just
as what was done previously return maps were also used for further analysis.
The return maps were done for populations b = 1, r = 2 and th = 4, 5 where

24
the maximum value of e occurs at th = 4 and the minimum e value is at
th = 5. The return map for the two populations is illustrated by figure 4.6.
From the return maps, as that of varying b values in figure 4.4, the one with
a smaller e value has a more ordered system since less spread occurs in its
return map. Here, the one with a low e value is the population with th = 5
(described by squares in the return map of figure 4.6) and the one with high
e value is the population with th = 4 (described by squares in the return
map of figure 4.6). The result, then, for varying th value is again consistent
with the conjecture that higher e value represents a more complex nature[36].

30
b=1, r=2
embedding dimension, e

25

20

15

10

0
3 4 5 6 7 8
mutation threshold, th

Figure 4.5: The appropriate embedding dimension, e, with respect to muta-


tion threshold, th (b = 1 and r = 2).

25
0.85

0.84

x(n+1)
0.83

0.82
th = 5
0.81 th = 4
0.8
0.8 0.81 0.82 0.83 0.84 0.85
x(n)

Figure 4.6: The first return map (z=1) of 100 points for populations r = 2,
b = 1 with different th values (4 and 5). The maximum population, K =
10000.

The embedding dimension, e, for varying reproductive age, r (low


th)

The embedding dimension is then correlated with the reproductive age.


This is first done for populations with low mutation threshold values (th = 2)
then for high mutation threshold, th = 8. Figure 4.7 shows that appropriate
e values also vary as r is increased. Return maps are again used for further
analysis. From figure 4.7, minimum and maximum values for appropriate
e occurs at r = 1, 2 respectively. The return maps of these populations are
shown by 4.8. The one with a lower e value, r = 1 (described by squares
in the return map), shows a small spread so much less than that of r = 2
(described by asterisks, *). Although the population at r = 2 has a cyclic
pattern suggesting short term periodicity, the population r = 1 with lower
e value is more ordered because it has a single point attractor. Our reult,
then, for varying r at low th supports the conjecture that high e represents
a more complex nature in the population[36].

26
30

embedding dimension, e
25

20

15

10

5 b=2, th=2

0
2 4 6
reproductive age, r

Figure 4.7: The appropriate embedding dimension, e, with respect to repro-


ductive, r (b = 2 and th = 2).

The embedding dimension, e, for varying reproductive age, r (high


th)

The appropriate embedding dimension is again plotted for varying re-


productive age, this time with high th value of 8. The result is as shown
by figure 4.9. Now, it is seen that as r increases, appropriate e value also
increases. Return maps were also used to explain this. The return map is
as shown by figure 4.10. From the return map, the one with lower e (r = 1,
described by squares in figure 4.10) shows a 4 point attractor which is more
ordered compared to that of higher e, r = 5, whose return map shows a cyclic
pattern (described by asterisks in figure 4.10). So the result for varying r at
high threshold is also consistent with the conjecture that high e represents a
more complex nature[36].

27
1.4 r=1
r=2
1.2
xn+1
1

0.8

0.6
0.6 0.8 1 1.2 1.4
xn

Figure 4.8: The first return map (z=1) of 100 points for populations b = 2,
th = 2 with different r values (1 and 2). The maximum population, K =
10000.

30
embedding dimension, e

25

20

15

10

5 b=2, th=8

0
2 4 6
reproductive age, r

Figure 4.9: The appropriate embedding dimension, e, with respect to repro-


ductive, r (b = 2 and th = 8).

28
2
r=1
r=5
1.5

xn+1 1

0.5

0
0 0.5 1 1.5 2
xn

Figure 4.10: The first return map (z=1) of 100 points for populations r = 2,
th = 2 with different b values (3 and 4). The maximum population, K =
10000.

4.1.2 The Lag Time, , in relation to the Penna Pa-


rameters: b, th and r
The appropriate lag time, is plotted against birth rate, b and is shown
in figure 4.11a. The mutation threshold, th is 2 and the reproductive age,
r is 2. For increasing b, generally increases. The appropriate lag time for
varying th is as shown by figure 4.11b. As with increasing b, generally
increases as th increases. The lag time, , is also correlated with reproductive
age, r; this was done at low and high th (th = 2, 8). Figure 4.12. At low
th, generally decreases as r is increased (figure 4.12a) while at high
th varies minimally at the value of 10 (figure 4.12b). As an overview, the
appropriate lag time, , for varying Penna parameters: b, th and r seems
to vary minmally around the value of 10, which was the short term period
found from a previous study due to component ages[29].

29
20 20
r=2, th=2 b=1, r=2
15 15
lag time, t

lag time, t
10 10

5 5

0 0
2 4 6 4 6 8
birth rate, b mutation threshold, th

Figure 4.11: (a)The appropriate lag time, , with respect to birth rate, b
(th = 2 and r = 2). (b)The appropriate lag time, , with respect to mutation
threshold, th (b = 1 and r = 2)

20 20
b=2, th=2 b=2, th=8
15 15
lag time, t

lag time, t

10 10

5 5

0 0
2 4 6 2 4 6
reproductive age, r reproductive age, r

Figure 4.12: (a)The appropriate lag time, , with respect to reproductive


age, r (b = 2). (a)low mutation threshold (th = 2); (b) high mutation
threshold (th = 8).

30
4.1.3 Summary of Results for the Chaotic Regime
As a summary of results for the chaotic regime, refer to figures 4.13 and
4.14.

35 35
embedding dimension, e embedding dimension, e
30 lag time, t 30 lag time, t
SPM parameter

SPM parameter
25 25

20 20

15 15

10 10

5 5

0 0
2 3 4 5 6 3 4 5 6 7 8
birth rate, b mutation threshold, th

Figure 4.13: SPM parameters for varying Penna parameters. (a)e, (repre-
sented by circles and squares respectively) wrt b and (b) e, (represented
by circles and squares respectively) wrt th.

30 35

25 30 embedding dimension, e
lag time, t
SPM parameter

SPM parameter

25
20
20
15
15
10
10
5 embedding dimension, e
5
lag time, t
0 0
1 2 3 4 5 6 7 1 2 3 4 5 6 7
reproductive age, r reproductive age, r

Figure 4.14: SPM parameters for varying r: (embedding dimension, e, wrt


reproductive age, r represented by circles while the lag time, , wrt r is
represented by squares) (a)low mutation threshold (b = 2 and th = 2); (b)
high mutation threshold (b = 2 and th = 8).

31
The embedding dimension in relation to the Penna parameters:

1. embedding dimension wrt birth rate (e vs b); e varies largely for dif-
ferent b values. Result is consistent with the conjecture that higher e
represents greater complexity within the population as shown by the
return maps of population with varying b values.

2. embedding dimension wrt mutation threshold (e vs th); e varies largely


for different th values. Result is consistent with the conjecture that
higher e represents greater complexity within the population as shown
by the return maps of population with varying th values.

3. embedding dimension wrt reproductive age (e vs r); varies largely for


different r values at low th. However, for high th, it was seen that e
increases as r increases. Results, still, are consistent with the conjec-
ture that higher e represents greater complexity within the population
as shown by the return maps of population with varying r values.

The lag time in relation to the Penna parameters:

1. lag time wrt birth rate ( vs b); generally increases as b increases.

2. lag time wrt mutation threshold ( vs th); generally increases as th


increases.

3. lag time wrt reproductive age ( vs r); generally decreases as r in-


creases for low th populations. For high th populations, seems to
vary minimally from a value of 10.

4. the appropriate lag time, , for Penna populations within the chaotic
regime, seems to have a value close to 10, the short term period[29].

32
4.2 The Appropriate SPM Parameters in the
Non-Chaotic Regime
The Non-chaotic regimes of the Penna model are those with b = 1 and
low th values and with varying r values. Figures( 4.15- 4.16) illustrates the
choice of the appropriate SPM parameters: e and th for populations corre-
sponding to the non-chaotic regime. The highest peak of each plot gives the
corresponding appropriate e to pairs since it gives the best reconstruction;
i.e., it has the highest 2 . For the non-chaotic range, the scanning range for
the appropriate embedding dimension is maintained at a 3-30 range just like
that in the chaotic regime. For the appropriate lag time, a 1-14 range was
implemented as well.

0.4

0.35

0.3

0.25
rho2

0.2

0.15

0.1

0.05

0
30 15
20 10
10 5
0
embedding dimension, e lag time, t

Figure 4.15: 2 vs. embedding dimension, e, and lag time, ; within the
non-chaotic regime. The appropriate embedding dimension, e = 20 and lag
time, = 3; with Penna Parameters: b = 1, th = 1 and r = 5.

33
0.09

0.08

0.07

0.06

rho2 0.05

0.04

0.03

0.02

0.01

0
30 25 10 12 14
20 15 10 4 6 8
5 2
embedding dimension, e lag time, t

Figure 4.16: Within the Non-chaotic regime: The appropriate embedding


dimension, e = 18 and lag time, = 5; Penna Parameters: b = 1, th = 2
and r = 5.

Table 4.3 gives the summary of the appropriate SPM parameters in the
non-chaotic regime.
The summary given by table 4.3 is then used for the correlation of the
appropriate SPM parameters: e and to the Penna parameter r.

34
Table 4.3: Summary of the appropriate SPM parameters: e and within the
non-chaotic regime (b = 1, th = 1 with varying r and b = 1, th = 2 with
varying r).

b = 1, th = 1 e
r=1 20 14
r=2 3 12
r=3 13 8
r=4 26 2
r=5 20 3
b = 1, th = 2 e
r=1 19 6
r=2 12 14
r=3 6 6
r=4 17 13
r=5 18 5
r=6 18 4
r=7 25 1

4.2.1 The Embedding Dimension, e, and the lag time,


in relation to the Penna Parameter: r
Since variation of the Penna parameter r does not contribute to popula-
tion fluctuations, it gives the non-chaotic regime of the Penna model. The
birth rate (b) and the mutation threshold (th) is kept at low values (b = 1
and th = 1, 2) to maintain the non-chaotic regime. Figure ( 4.17- 4.18)
show that as r increases, the appropriate embedding dimension generally
increases. Return maps were used for further analysis but since in the non-
chaotic regime, population time series are periodic over time and so single
point attractors are expected and not much information can be taken from it
(see figure 4.19). The appropriate lag time for varying r values are also pre-
sented. This is shown by figure 4.20. The appropriate lag time, generally

35
decreases as r increases.
30

embedding dimension, e
25

20

15
b=1, th=1
10

0
1 2 3 4 5
reproductive age, r

Figure 4.17: The appropriate embedding dimension, e, wrt reproductive age,


r. Penna parameters are b = 1 and th = 1.

30
embedding dimension, e

25

20

15
b=1, th=2
10

0
1 2 3 4 5 6 7
reproductive age, r

Figure 4.18: The appropriate embedding dimension, e, wrt reproductive age,


r. Penna parameters are b = 1 and th = 2.

36
1
r=2
0.9 r=4

x(n+1) 0.8

0.7

0.6

0.5

0.4
0.4 0.5 0.6 0.7 0.8 0.9 1
x(n)

Figure 4.19: The first return map (z=1) of 100 points for populations b = 1,
th = 1 with different r values (2 and 4). The maximum population, K =
10000.

20 20
b=1, th=1 b=1, th=2
15 15
lag time, t

lag time, t

10 10

5 5

0 0
2 4 2 4 6
reproductive age, r reproductive age, r

Figure 4.20: The appropriate lag time, for varying reproductive age, r
(b = 1). (a)th = 1; (b)th = 2.

37
4.2.2 Summary of Results for the Non-chaotic Regime
The embedding dimension for the non-chaotic regime varies minimally
30
embedding dimension, e
25
lag time, t

SPM parameter
20

15

10

0
1 2 3 4 5
reproductive age, r

Figure 4.21: SPM parameters: e, (represented by circles and squares re-


spectively) for varying Penna parameter r where th = 1.

30

25 embedding dimension, e
lag time, t
SPM parameter

20

15

10

0
1 2 3 4 5 6 7
reproductive age, r

Figure 4.22: SPM parameters: e, (represented by circles and squares re-


spectively) for varying Penna parameter r where th = 2.

For varying r in the non-chaotic regime:

1. The appropriate embedding dimension, e increases generally as r in-


creases.

2. The appropriate lag time, tau, are of low values and generally decreases
as r increases.

38
Chapter 5

Conclusion

The VB implementation of the Penna Model was used to generate the popula-
tion time series to be characterized. Different population time series defined
by different birth rate, b, reproductive age, r and mutation threshold, th
were generated. Since the Penna Model is to be characterized by the simplex
projection method, the population time series generated were reconstructed
using SPM. The characterization is done according to two regimes: chaotic
and non-chaotic. This is done by correlating the Penna parameters and the
SPM parameters with the Penna parameters defining the regime. That is,
the choice of the Penna parameters depends on the parameters effect on the
chaoticity or the lack of it in the population. High values of the Penna pa-
rameters: birth rate, b and mutation threshold, th yields the chaotic regime
of the Penna populations1 . The appropriate SPM parameters: embedding
dimension, e, and the lag time, are chosen with the use of a statistical
measure; in this case, the Pearson correlation coefficient. The square of this
coefficient is plotted over the SPM parameters, e and . The maximum 2
1
However, it was shown from some of the return maps that some populations, although
with (b > 2) values and high th, with certain combinations of b, th and r could have
periodocity and are therefore non-chaotic. The complexity map of the Penna model could
still be investigated for full mapping.

39
value has the coordinate value of the appropriate e and .

The chaotic regime is given by an increase in the Penna parameters: b and


th. In this regime, the appropriate SPM parameters: e and are correlated
with b, th and r. For the correlation with the non-chaotic parameter, r, b is
kept greater than 1 to maintain chaoticity. It was seen that the appropriate
embedding dimension, e, varies largely for different values of b and th and r.
Return maps show that the value of e depends on how complex the popula-
tion is. Return maps shows cyclic patterns which suggests presence of short
term periodicity within the chaotic regime[30]. Results are consistent with
the conjecture that high embedding dimension represents higher degree of
complexity[36]. The lag time, , on the other hand, exhibits a trend with re-
spect to b, th and r. An increase in the appropriate lag time, , with respect
to increasing b and th was observed. With respect to r, a general decrease in
was observed for low th. For high th, appropriate values varies minimally
around 10. Generally, for the chaotic regime, appropriate lag time, values
are around 10 which is the short term period due to component ages which
was found from a previous study[29].

The non-chaotic regime is given by Penna parameter sets of low b and th


values through a range of r. Likewise with the chaotic regime, the appropri-
ate SPM parameters: e and are correlated with the single Penna parameter;
i.e., r. The return maps for populations used shows consistency within the
non-chaotic regime since the return maps show single point attractors which
verifies periodicity. It was seen that appropriate e value generally increases
as r is increased. Appropriate lag time, values, on the other hand, generally
decreases as r is increased.

40
In summary, in the chaotic regime (i.e for populations defined by high b
and th), the appropriate e takes on varying values with respect to increasing
Penna parameters; i.e. the appropriate embedding dimension, e, exhibits no
general trend with respect to r, b and th. It was seen that the appropri-
ate embedding dimension, e, depends on the complexity of the population.
Higher dispersion in the population time series is deemed to represent greater
complexity since high population fluctuations is evidence of chaoticity. High
population fluctuation is translated to greater dispersion in terms of the
return maps of populations described by high embedding dimension when
reconstructed using SPM. Results are consistent with the conjecture that
high e values corresponds to greater complexity[36].

The lag time is the distance between points (within the basis points, i.e.,
the first half of time series of 1000 points) to be projected in the e-dimesional
space. The points in the e-dimesional space creates the simplex, or neighbor-
hood of points which are then used to reconstruct the next half of the series.
The appropriate lag time, , in the chaotic regime takes on values around
10. This is consistent with the short term period found due to component
ages[29]. This means that high correlation occurs when the appropriate lag
time used is 10. This is because the points used to predict the next part of
the series are separated by 10 timesteps and since the short term period is 10,
the points every 10th time steps are of generally the same value (in this case
this value is the distribution of component ages) and this is why higher cor-
relation is expected when the points used for prediction are generally related.

In the non-chaotic regime, return maps show single point attractors which

41
conforms with the non-chaotic regime since periodicity is present. The ap-
propriate embedding dimension, e, generally increases as r increases while
the appropriate lag time, generally decreases as r is increased.

42
Appendix A

Programming Details

Machine specifications:
Operating System: Linux (Ubuntu 6.06)
Machine epsilon: 1.08420217E-19
Compiler: GNU project Fortran 77 compiler

A.1 Independent Penna Population Genera-


tion
The FORTRAN program utilized for the generation of independent pop-
ulations with varying parameters is based on earlier programs developed at
the Structure and Dynamics Group of the National Institute of Physics by
Nombres [30] and Beech [29]. The code can be found from the references
cited.

43
A.2 Distribution and Period of the Random
Number Generator
A discussion on the random number generator used in the population
generation could be found from one of the reference cited [30] with the code
for Penna Population generation. It was shown in the particular reference
that the points generated by the random number generator used (randclm)
has an even distribution via return map of the first 10000 points generated
from 0 to 1 (See figure A.1 [30])

1
0.9
0.8
0.7
0.6
n+1

0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
n

Figure A.1: Return map of the points generated by the random number
generator used. 10000 points were used with nseed = 1.

From the same reference (Section B.2 of the Appendix): The random
number generator, randlcm() used in the program has a period of about 2
billion so that it can generate numbers between 0 and 1 this number of times
before it repeats itself of these values. If we are to check for how many times
randlcm() is called within the program per individual, we see that there are
only two where one is called in subroutine mutate() for the random death

44
procedure and the other one in subroutine mutate() for the mutation process.
A maximum of approximately 25000 individuals are alive in one timestep so
that randlcm() will be called 50000 times in one timestep. With the alloted
period for randlcm() a maximum of 40000 timesteps are allowed which is
more than enough for this study. Thus, the trouble of exhaustion of the
random number generator where it repeats the values that it generates will be
avoided.

A.3 Simplex Projection and Statistical Eval-


uation
A FORTRAN code was developed for implementing the SPM wherein the
lagtime was included as an argument in the edimdist subroutine so as to be
able to vary it and be able to correlate it with the Penna parameters. Also,
an included part of the program is the statistical measure which evaluates
the Pearson correlation coefficient of the generated Penna populations. The
following shows the main body of the program as well as the subroutines
edited to suit our purposes:
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Simplex Projection method applied to the time series generated *
* by the Penna model *
* *
* Input: *
* embdim embedding dimension *
* t lag time *
* xmax maximum x to evaluate *
* nahead *
* *.dat file generated by the main program *
* *
* Output: *
* xact() actual value *
* xpred() predicted value *
* xact2() actual value to be (?) *
* xpred2() predicted value to be (?) *

45
* xval value where function is evaluated (?) *
* xvalar(), xvalar2() arrays of xval *
* xstep stepsize *
* tp timestep(s) into the future *
* nbasis *
* nextxt predicted value *
* *
* Last modified: 2007.03.21 IBO, SB *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
implicit none
character*1 choice
real*4 t(1000),x(1000),xb(1000)
real*4 xact(1000)
integer*4 xpred(1000)
real*4 xact2(1000), xpred2(1000)
real*4 xval
real*4 xvalar(1000), xvalar2(1000)
real*4 err(500)
real*4 sumAct,sumPred
real*4 countAct,countPred
real*4 meanAct,meanPred
real*4 diffAct(500),diffPred(500)
real*4 prodNum(500)
real*4 sumProd
real*4 sumActDen,sumPredDen
real*4 diffActDen(500),diffPredDen(500)
real*4 cc,num,den
real*4 xmax
real*4 xstep
integer*4 embdim
integer*4 tp,nahead
integer*4 nbasis
real*4 nextxt
integer*4 i,j,k
real*8 edimdist
external edimdist

open(1,file=r1.dat,status=old)!Penna data file

read(1,*,end=100)x

!write(*,*)x

xmax = 1000
nahead = 500

46
xstep = xmax/999

do embdim = 3,20

do i = 1,1000
xval = (i-1)*xstep
xvalar(i) = xval
xact(i) = x(i)
end do

do i = 1,1000
xpred(i) = xact(i)
end do

nbasis = 1000 - nahead


do tp = 1,nahead
call nextval(tp,embdim,nbasis,xact,nextxt)
xpred(nbasis+tp) = nextxt
xact2(tp) = xact(nbasis+tp)
xvalar2(tp) = xstep*tp
end do

call pcc(xpred,xact,nbasis,nahead,cc)

write(*,*)cc

end do

close(1)

100 continue
end !program sp.f

************************************************************************
subroutine nextval(tp,edim,n,x,nextx)
implicit none
integer*4 tp, edim, n !n is number of point so far
integer*4 n1 !n1 is index of chosen point
real*4 nextx !intent(out)
real*4 x(1000),dist(1000)
integer*4 indx(1000)
integer*4 j
real*8 edimdist
external edimdist

47
n1 = n-(tp-1) !N+1 is predicted by adding tp
!Calculate distances from n1
do j=1,n
dist(j) = edimdist(edim,x,n1,j)
end do

!Sort distances and determine relevant indices


call indexx(n,dist,indx)
!For e-dimensions, pertinent indx are indx(1),...,indx(edim+1)

!Calculate predicted value...


nextx = 0.0 !initialize to zero
!tracing in time is keeping track of weights
!do i=tp,0,-1
do j = 2,edim+2 !nearest e+1 neighbors
nextx = nextx+x(indx(j)+tp) !*exp(-real(tp-i))
end do
!end do

nextx = nextx/(edim+1)

end !subroutine nextval

************************************************************************
real*8 function edimdist(edim,x,n1,n2)
implicit none
integer*4 edim, n1, n2
real*4 x(1000)
real*4 sumsq
integer*4 i
integer*4 t

t = 2.0 ! set the value for the lag time here


sumsq = 0.0 !initialize sum to zero
do i = 0,edim-1
sumsq = sumsq+(x(n1-(i*t))-x(n2-(i*t)))**2.0
end do
edimdist = sqrt(sumsq)
end !function edimdist

************************************************************************
subroutine indexx(n,arr,indx)
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Indexes an array arr(1:n), i.e., outputs the array indx(1:n) *
* such that arr(indx(j)) is in ascending order for j = 1, 2, . . ., N *

48
* The input quantities n and arr are not changed. *
* *
* From "NUMERICAL RECIPES IN FORTRAN77: *
* THE ART OF SCIENTIFIC COMPUTING" (ISBN 0-521-43064-X) *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
integer n,indx(n),M,NSTACK
real arr(n)
parameter (M=7,NSTACK=50)
integer i,indxt,ir,itemp,j,jstack,k,l,istack(NSTACK)
real a

do j=1,n
indx(j)=j
end do
jstack=0
l=1
ir=n
1 if (ir-l.lt.M) then
do j=l+1,ir
indxt=indx(j)
a=arr(indxt)
do i=j-1,l,-1
if (arr(indx(i)).le.a) goto 2
indx(i+1)=indx(i)
end do
i=l-1
2 indx(i+1)=indxt
end do
if (jstack.eq.0) return
ir=istack(jstack)
l=istack(jstack-1)
jstack=jstack-2
else
k=(l+ir)/2
itemp=indx(k)
indx(k)=indx(l+1)
indx(l+1)=itemp
if (arr(indx(l)).gt.arr(indx(ir))) then
itemp=indx(l)
indx(l)=indx(ir)
indx(ir)=itemp
end if
if (arr(indx(l+1)).gt.arr(indx(ir))) then
itemp=indx(l+1)
indx(l+1)=indx(ir)

49
indx(ir)=itemp
end if
if (arr(indx(l)).gt.arr(indx(l+1))) then
itemp=indx(l)
indx(l)=indx(l+1)
indx(l+1)=itemp
end if
i=l+1
j=ir
indxt=indx(l+1)
a=arr(indxt)
3 continue
i=i+1
if (arr(indx(i)).lt.a) goto 3
4 continue
j=j-1
if (arr(indx(j)).gt.a) goto 4
if (j.lt.i) goto 5
itemp=indx(i)
indx(i)=indx(j)
indx(j)=itemp
goto 3
5 indx(l+1)=indx(j)
indx(j)=indxt
jstack=jstack+2
if (jstack.gt.NSTACK) pause STACK too small in indexx
if (ir-i+1.ge.j-l) then
istack(jstack)=ir
istack(jstack-1)=i
ir=j-1
else
istack(jstack)=j-1
istack(jstack-1)=l
l=i
end if
end if
goto 1
end

************************************************************************
subroutine pcc(xpred,xact,nbasis,nahead,cc)
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Calculates the square of the Pearson correlation coefficient *
* to determine the error of the points predicted by the SPM from the *
* actual time series generated by the Penna model. *

50
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
implicit none
real*4 xact(1000)
integer*4 xpred(1000)
real*4 sumAct,sumPred
real*4 countAct,countPred
real*4 meanAct,meanPred
real*4 diffAct(500),diffPred(500)
real*4 prodNum(500)
real*4 sumProd
real*4 sumActDen,sumPredDen
real*4 diffActDen(500),diffPredDen(500)
real*4 cc,num,den
integer*4 tp,nahead
integer*4 nbasis

sumPred = 0.0
countPred = 0.0
do tp = 1,nahead
sumPred = sumPred + xpred(nbasis+tp)
countPred = countPred + 1
end do
meanPred = sumPred/countPred

sumAct = 0.0
countAct = 0.0
do tp = 1,nahead
sumAct = sumAct + xact(nbasis+tp)
countAct = countAct + 1
end do
meanAct = sumAct/countAct

do tp = 1,nahead
diffPred(tp) = xpred(nbasis+tp) - meanPred
diffAct(tp) = xact(nbasis+tp) - meanAct
prodNum(tp) = diffPred(tp)*diffAct(tp)
end do

num = 0.0
do tp = 1,nahead
num = num + prodNum(tp)
end do

sumPredDen = 0.0
sumActDen = 0.0

51
do tp = 1,nahead
diffPredDen(tp) = (xpred(nbasis+tp) - meanPred)**2
diffActDen(tp) = (xAct(nbasis+tp) - meanAct)**2
sumPredDen = sumPredDen + diffPredDen(tp)
sumActDen = sumActDen + diffActDen(tp)
end do

den = SQRT(sumActDen*sumPredDen)

cc = (num/den)**2

end

52
Appendix B

Effect of Randomization on
Penna populations and on the
appropriate SPM parameters
obtained

To see whether there is considerable effect by the randomization on the


appropriate SPM parameters: embedding dimension, e, and lag time, , the
seed number1 value was varied and the effect on the appropriate e and value
was observed. We change the value of the seed from 1 to 2 and then generate
the population with the same Penna parameters. As an example, we used
parameters b = 3, r = 2 and th = 2. Figure B.1 shows the population of
b = 3, r = 2 and th = 2 but with different seed values used by the ran-
dom number generator for generating the populations. The two populations
with different seed values (1,2) basically has the same amount of fluctuation
over the same mean value. This means that most likely the behavior of the
population is the same since in their return map, the same amount of fluctu-
ations would yield the same amount of dispersion. From the conjecture that
1
see the random number generator, randclm(), from codes found in references [30], [29]

53
a high embedding dimension value represents greater complexity, the greater
the dispersion is in the return maps the more complex the system is. Since
both population with different seed values fluctuates with the same amount
over the same mean value, we expect that their embedding dimension must
not vary largely. For the populations described by figure B.1, the appro-
priate embedding dimension found are the same, e = 3. For the lag time
of both population, appropriate varies minimally; for the population with
seed = 1, = 5 while that of seed = 2 has = 7. This variation is small
enough that we can say that the randomization more or less has no effect on
the appropriate SPM parameters obtained.
20000

seed = 1
seed=2
15000
Population, N(t)

10000

5000

0
0 200 400 600 800 1000
time, t

Figure B.1: The seed value was changed from 1 to 2.

54
Bibliography

[1] T. J. Penna. A bit-string model for biological aging. Journal of Statistical


Physics, 1995.

[2] L. Partridge and M. Mangel. Message from mortality: The evolution of


death rates in the old. TREE, 1999.

[3] E. E. Peters Chaos and order in the capital markets John Wiley and
Sons, (1991).

[4] F. Tata, and C. Vassilicos Is there chaos in economic time series? A


study of the stock and foreign exchange markets LSE financial markets
group, DP 120, 1991.

[5] J. A. Scheinkman and B. LeBaron Non-linear dynamics and stock re-


turns Journal of Business, 62: 311-328, (1989).

[6] C. Alexander and I. Giblin Creating Order Out of Chaos RISK Maga-
zine, 7,6:71-76, (1994).

[7] P. Grassberger and I. Procaccia Characterisation of strange attractors


em Phys. Rev. Letters, 50: 346-349, (1983a).

[8] M. Casdagli Non-linear prediction of chaotic time series Physica D, 35:


335-356, (1989).

55
[9] M. Casdagli Chaos and deterministic vs stochastic non-linear modelling
Journal of the Royal Statistical Society, Series B, 54: 303-328, (1992).

[10] D. Nychka, S. Ellner, A. R. Gallant and D. McCaffrey Finding chaos


in noisy systems Journal of the Royal Statistical Society, Series B, 54:
399-426, (1992).

[11] G. Sugihara and R. May. Nonlinear forecasting as a way of distinguishing


chaos from measurement error in time series. Nature, 344, 1990.

[12] C. O. Alexander, I. Giblin Searching for chaos in financial markets


University of Sussex Mathematics Research Report, 93-24, (1993)

[13] S. Neil Rasband. Chaotic Dynamics of Nonlinear Systems. John Wiley


& Sons, Inc., NY, USA, 1990.

[14] J. D. Farmer N. H. Packard, J. P. Crutchfield and R. S. Shaw. Geometry


from a time series. Phys. Rev. Lett., 45(712-715), 1980.

[15] F. Takens. Detecting strange attractors in fluid Turbulence, Dynamical


Systems and Turbulence ed. by D. A. Rand and L.-S. Young, Springer,
Berlin, 1980.

[16] S. Moss de Oliveira, D. Alves and J. S. Sa Martins. Evolution and


Ageing. Physica A, 285: 77-100, 2000.

[17] K. Malarz. Searching for scaling in the penna bit-string model of bio-
logical aging. International Journal of Modern Physics C, 2000.

[18] J. S. Sa Martins and D. Stauffer Justification of Sexual Reproduction


by Modified Penna Model of Ageing. Physica A, 294: 191-194, 2001.

56
[19] J. S. Sa Martins and S. Moss de Oliveira. Why Sex? - Monte Carlo
Simulations of Survival After catastrophes. International Journal of
Modern Physics C, 9:421-432, 1998.

[20] D. Makowiec Penna Model of Biological Aging on a Lattice. Physica A,


pages 208-222, January 2001.

[21] M. He and J. Lin, H. Jiang and X. Liu. The Two Populations Cellular
Automata Model with Predation Based on the Penna Model. Physica
A, 312: 243-250, September 2002.

[22] D. Stauffer, P. M. C. de Oliveira, S. Moss de Oliveira, T. J. P. Penna


and J. S. Sa Martins. Computer Simulations for Biological Aging and
Sexual Reproduction.

[23] J. S. Sa Martins, S. Moss de Oliveira and G. A. de Medeiros. Simulated


Ecology-Driven Sympatric Speciation. Physical Review E, 64, 2001.

[24] Y. Mao J. B. Coe and M. E. Cates. Solvable senescence model with


positive mutations. Phys. Rev. E, 70, 2004.

[25] D. Stauffer P. M. C. d. O. S. Moss de Oliveira and J. S. S. Mar-


tins. Positive mutations and mutation-dependent verhulst factor.
http://arXiv:condmat/0308532v1, August 26, 2003.

[26] J. B. Cohen How Many People Can the Earth Support? W.W. Norton
and Co., New York, 1995.

[27] J. S. Sa Martins and S. Cebrat. Random Deaths in a Computational


Model for Age-Structured Populations. Theory of Bioscience, 119:156-
165, 2000.

57
[28] J. B. Coe and Y. Mao. Solvable Senescence Model Showing a Mortality
Plateau. Phys. Rev. E 89, 2002.

[29] M. D. N. Beech. Age Structure and Survivability in an 8-bit Penna


Model. Thesis, University of the Philippines, Diliman, 2005.

[30] C. D. C. Nombres. Population Dynamics in the Penna Model with Short-


Bit Strings. Thesis, University of the Philippines, Diliman, 2004.

[31] M. D. N. Beech and R. S.Banzon. Periodicity in a Penna Model. 22nd


SPP Congress Proceedings, 2002.

[32] F. Takens. Lecture Notes in Mathematics, ed. by D. A. Rand and L.-S.


Young. Springer, Berlin, 1980.

[33] Jonq Juang Cheng Huang, J. Y. Wang. Controlling chaotic behavior


of heavy to light hole mixing tunneling by external fields. Journal of
Quantum Electronics, August 1997.

[34] D. Holton and R. May. The Nature of Chaos, ed. by T. Mullin. Oxford
University Press, USA, 1993.

[35] M. Lim and C. Saloma. Enhancement of low-resolution raman spectra


by simplex projection. Optics Communications, 186, 2000.

[36] R. P. V. Mandingiado. Complexity and Competitive Advantage in Penna


Populations. Thesis, University of the Philippines, Diliman, 2006.

58

You might also like