MOD002641 Assignment 2013-14

8ev uaLe: 10-02-14
1
Data Structures & A|gor|thms
MCD002641

lan van der Llnde, hu

Ass|gnment 2013-14:
LxperlmenLal AlgorlLhm CharacLerlsaLlon: uarwln's llnches

Introduct|on

uurlng Lhe module so far, we have conducLed thought exper|ments Lo deLermlne Lhe best case,
worst case and average case behavlour of Lhe algorlLhms LhaL we have sLudled. lor slmple
algorlLhms, Lhls approach ls perfecLly posslble and lL has served us well. Powever, someLlmes,
when an algorlLhm ls more compllcaLed or conLalns random elemenLs, paper-based analysls
becomes dlfflculL.

ln Lhls asslgnmenL, we wlll be conslderlng Lwo Lypes of algorlLhm: Lhe flrsL relles upon random
chance Lo sLumble upon Lhe opLlmal soluLlon Lo a problem. We wlll refer Lo Lhls klnd of algorlLhm
as Sa|tat|on, whlch ls an embodlmenL of Poyle's lallacy. 1he second uses Lhe prlnclple of
Cumu|at|ve Se|ect|on Lo evolve a soluLlon Lo a problem. 1hls ls a Genet|c A|gor|thm, an
embodlmenL of uarwlnlan evoluLlon by sotvlvol of tbe flttest. 8efer Lo Lecture 0S for a dlscusslon
of Lhese procedures. Lach algorlLhm holds a (slmulaLed) porLlon of Lhe genome of uarwln's flnches
LhaL codes for beak lengLh, and we are golng Lo selecL for Lhe lndlvlduals wlLh Lhe longesL beaks Lo
slmulaLe an envlronmenL ln whlch longer beaks confer a survlval advanLage (e.g., by enabllng long-
beaked lndlvlduals Lo access more food, such as grubs burled deeper lnslde Lrees). 1he slmulaLlon
leL's us experlmenL wlLh whaL would happen (ln Lerms of how many generat|ons lL would Lake Lo
evolve) lf beak lengLh were coded on dlfferenL numbers of genes (genome |ength), and whaL
effecL Lhe number of dlfferenL posslble alleles of Lhese genes mlghL have (gene var|et|es).

ln Lhe flrsL parL of Lhe asslgnmenL (worLh 60 marks), we wlll characLerlse Lhe performance of Lhese
algorlLhms by emp|r|ca| exper|mentat|on. LssenLlally, runnlng Lhe algorlLhms on a real compuLer
under dlfferenL lnpuL condlLlons, and record how much work each approach underLook. ln Lhe
second parL of Lhe asslgnmenL (worLh 40 marks), we wlll modlfy Lhe programs and evaluaLe Lhe
lmpacL LhaL Lhese modlflcaLlons had on performance.

1wo programs, wrlLLen ln C/C++, are provlded. Lach program has Lwo verslons, a oolsy verslon LhaL
produces lnformaLlve screen ouLpuLs whlle runnlng Lo help you undersLand how Lhey work, and a
polet verslon LhaL wlll run much fasLer and can be used for daLa collecLlon:

1he flrsL program uses a Sa|tat|on approach. lL Lrles Lo generaLe a random number sequence
(whlch we wlll call Lhe ch||d genome) LhaL has Lhe largesL sum (target f|tness). 1he sum
8ev uaLe: 10-02-14
2
represenLs beak lengLh. 1he program counLs how many aLLempLs were needed Lo obLaln an
opLlmally long beak. We wlll refer Lo aLLempLs as generat|ons. 1he program reporLs Lhls
number upon compleLlon.

1he second program uses a Genet|c A|gor|thm Lo Lackle Lhe same Lask. lL evolves Lhe number
sequence (ch||d genome) over successlve generaLlons by uslng Lhe longesL-beaked chlld from
Lhe prevlous generaLlon as a parent genome Lo seed Lhe nexL generaLlon. 1o creaLe each ch||d
genome, Lhe currenL parent genome ls mutated sllghLly (l.e., we randomlse some, buL noL all,
of lLs genes). Agaln, Lhe number of generat|ons lL Look Lo obLaln an opLlmally long beak ls
recorded and reporLed as a screen ouLpuL upon compleLlon.

art 1 (60 marks): LxperlmenLal Comparlson of SalLaLlon and CumulaLlve SelecLlon

lo pott 1, yoot tosk ls to expetlmeotolly evoloote tbe petfotmooce of tbese two olqotltbms
(soltotloo vs. comolotlve selectloo) ooJet Jlffeteot lopot cooJltloos, ooJ to moke some JeJoctloos
oboot tbelt bebovloot.

lf yoo Jo oot bove o c/c-- compllet lostolleJ oo yoot compotet, yoo moy wlsb to JowolooJ ooJ
lostoll tbe ftee coJe8locks c/c-- lu ooJ compllet 8ooJle.

(a) llrsL, we wlll LesL Lhe Sa|tat|on algorlLhm. lL requlres Lwo command-||ne arguments Lo work,
so musL be run from a shell (e.g., commooJ ltompt ln Wlndows, or 1etmlool ln MacCS or
Llnux). 1he flrsL command-llne argumenL (followlng Lhe name of Lhe execuLable flle) ls Lhe
lengLh of Lhe number sequence Lo be used (Lhe genome |ength), Lhe second ls number of
posslble values LhaL each number ln Lhe sequence can assume (Lhe gene var|et|es). 1he
lengLhlesL beak ls calculaLed as Lhe hlghesL numerlc sum posslble, glven Lhe genome |ength
and number of gene var|et|es.

When you run Lhe program, lL outputs how many aLLempLs were needed Lo obLaln Lhe
lengLhlesL posslble beak by repeaLedly populaLlng Lhe ch||d genome wlLh random values from
1 Lo gene var|et|es and measurlng Lhe sum of each genome unLll Lhe lengLhlesL posslble beak
lengLh ls obLalned. We wlll refer Lo Lhls process as evaluaLlng Lhe f|tness of Lhe ch||d genome.

loL a graph wlLh genome |ength on Lhe x-oxls and generat|ons to evo|ve on Lhe y-oxls. 8un
Lhe program Lo LesL lLs performance for a range of genome lengLhs (l.e., each poslLlon on Lhe x-
axls, such as from 1 Lo 10), and number of gene varleLles (e.g., 2, 4, 8, 16). 1hls wlll requlre
several separaLe llnes on your graph, one llne for each of Lhe dlfferenL numbers of gene
varleLles we run for (e.g., uslng Lhe seLLlngs suggesLed above, Lhere wlll be 4 separaLe llnes).

noLe LhaL Lhls algorlLhm could Lake mlnuLes, hours or even days Lo run. lL mlghL also never
flnlsh, so sLarL wlLh small genome |engths and low numbers for gene var|et|es and bulld up Lo
Lhe larger values.

lor each parLlcular seLLlng (e.g., genome |ength=3, gene var|et|es=4) you wlll need Lo repeaL
your measuremenL a number of Llmes (e.g., 10) Lo obLaln a good esLlmaLe (smooLh llnes ln
your graph), because we wanL average case behavlour, and lf we don'L run each conflguraLlon
a number of Llmes and Lake Lhe mean, our esLlmaLe may be nearer Lo Lhe worst or best cases
due Lo very bad or very good luck.

8ev uaLe: 10-02-14
3
lf you run for Lhe range of seLLlngs suggesLed above (10 genome |engths and 4 seLLlngs for
gene var|et|es), Lhere wlll be 40 daLa polnLs ln your graph ln LoLal, 10 for each of Lhe 4 llnes.

As menLloned above, each daLa polnL on our graph should be Lhe mean of a number of
repeLlLlons of LhaL seLLlng. Powever, you should annoLaLe each daLa polnL wlLh an error bar
showlng Lhe Jlspetsloo of Lhe somples conLrlbuLlng Lo each daLa polnL. Lrror bars can be easlly
added Lo Lxcel graphs. 1he standard error for a seL of measuremenLs for a glven polnL can be
calculaLed ln Lxcel uslng Lhe followlng formula ln a convenlenLly locaLed cell
=stdev(A1:A10)/sqrt(10), assumlng LhaL we have 10 repeaLed measuremenLs LhaL are
sLored ln column A, rows 1 Lo 10. ?ou could also ploL ln MA1LA8, lf you are more famlllar wlLh
lL. 8emember Lo label your axes, and Lo LlLle your graphs. noL dolng Lhls wlll lose you marks!

1hlnk abouL how besL Lo scale your axes Lo show your daLa ln Lhe mosL lnLerpreLable way.

AfLer you have compleLed daLa collecLlon and ploLLed Lhe graph, you wlll have characLerlsed
Lhe salLaLlon algorlLhm. WrlLe Lwo or Lhree paragraphs explalnlng your flndlngs and dlsLll Lo a
growLh funcLlon formula, lf you can, by Lhlnklng of Lhe polnLs as values from a sequence.

(20 marks)

(b) nexL, we wlll LesL Lhe Cumu|at|ve Se|ect|on algorlLhm. !usL llke Sa|tat|on, lL accepLs command-
llne lnpuLs, so musL be run from a shell. Llke salLaLlon, genome |ength and gene var|et|es are
accepLed as Lhe flrsL Lwo command-llne argumenLs. Powever, our Cumu|at|ve Se|ect|on
algorlLhm accepLs Lwo addlLlonal argumenLs, correspondlng Lo Lwo oLher seLLlngs LhaL lL
requlres. ln Lhls quesLlon we wlll evaluaLe Lhe lmpacL of Lhe flrsL of Lhese seLLlngs, mutat|on
rate.

uurlng Lhe creaLlon of a new ch||d genome from a parent genome, Lhe mutat|on rate dlcLaLes
how llkely lL ls LhaL a parLlcular gene (l.e., a parLlcular value ln our number sequence) wlll be
replaced. lor example, a mutat|on rate of 0 means LhaL Lhere ls no chance of any of Lhe genes
belng replaced. A muLaLlon raLe of 100 means LhaL every slngle gene wlll be replaced, maklng
Lhe algorlLhm effecLlvely Lhe same as Sa|tat|on.

lor each genome |ength, and number of gene var|et|es, you are Lo LesL dlfferenL values of
mutat|on rate, Lo see whaL effecL Lhls has on performance (l.e., permlLs Lhe target genome Lo
be evolved ln Lhe fewesL generat|ons).

lor example, you mlghL measure Lhe number of generat|ons requlred Lo evolve Lo Lhe target
f|tness for each mutat|on rate from 2 from 20 ln sLeps of 2 (10 sLeps). Agaln, each
measuremenL wlll need Lo be repeaLed a number of Llmes, creaLlng standard error bars ln
Lxcel or MA1LA8.

1hls Llme, lL makes more sense Lo place mutat|on rate on Lhe x-axls, and generat|ons Lo evolve
target f|tness on Lhe y-axls. lf we mlghL LesL muLaLlon raLes 2 Lo 20, we can have one llne
on Lhe graph for each number of genome var|et|es (2, 4, 8, 16), as we dld ln parL (a), above.
Powever, slnce genome |ength ls no longer on Lhe x-axls, we wlll need Lo generaLe several
graphs, one for each genome |ength (e.g., lf we LesL for genome lengLhs from 1 Lo 10, we wlll
need 10 graphs).

8ev uaLe: 10-02-14
4
lurLhermore, CumulaLlve SelecLlon Lakes one addlLlonal argumenL: Lhe ch||dren per
generat|on. lor Lhe Llme belng, we wlll flx Lhls aL 8.

Agaln, Lhlnk abouL how besL Lo scale your axes Lo show your daLa ln Lhe mosL lnLerpreLable
way.

AfLer you have compleLed Lhls quesLlon, you should wrlLe 2 or 3 paragraphs explalnlng your
flndlngs. lor lnsLance, does Lhe optlmom mutat|on rate (l.e., Lhe mutat|on rate LhaL enables Lo
target genome Lo be evolved ln Lhe fewesL generaLlons) dlffer as we change Lhe number of
gene var|et|es and/or Lhe genome |ength, or ls lL always Lhe same?

(20 marks)

(c) nexL, we wlll LesL Lhe lmpacL of Lhe number of ch||dren per generat|on on Lhe number of
generat|ons requlred Lo evolve Lhe target genome ln our Cumu|at|ve Se|ect|on algorlLhm.

CreaLe 10 graphs, one for each genome |ength from 1 Lo 10, llke we dld ln parL (b), above. Cn
Lhe x-axls of each graph, vary ch||dren per generat|on (2, 4, 8, 16). Cn Lhe y-axls we wlll have
generat|ons Lo evolve Lhe target f|tness.

1here wlll be four llnes on each graph, one for each number of gene var|et|es (2, 4, 8, 16). lor
each daLa polnL, seL Lhe muLaLlon raLe Lo be Lhe opLlmum value for LhaL genome |ength and
number of gene var|et|es, as ascerLalned ln quesLlon (b) - l.e., Lhe muLaLlon raLe LhaL ylelds Lhe
fewesL generaLlons Lo evolve Lhe target f|tness.

Agaln, Lhlnk abouL how besL Lo scale your axes Lo show your daLa ln Lhe mosL lnLerpreLable
way. AfLer you have compleLed Lhls quesLlon, you should wrlLe 2 or 3 paragraphs explalnlng
your flndlngs.
(20 marks)

art 2 (40 marks): Modlfylng Lhe SLrlng MaLchlng AlgorlLhms

lo pott 2, yoot tosk ls to expetlmeotolly evoloote tbe lmpoct of sevetol moJlflcotloos to tbe two
olqotltbms tbot yoo bove beeo ptovlJeJ wltb. oo ote to sobmlt botb yoot coJe, olooq wltb yoot
expetlmeotol tesolts ooJ Jlscossloo (see below). Note tbot to tecelve foll motks, yoot coJe most be
commeoteJ, cottectly loJeoteJ, ecooomlcol (tbe smollet tbe bettet), ooJ ose seoslble/lotoltlve
votloble oomes. kefet to tbe two exomple ptoqtoms fot qolJooce oo bow to occompllsb tbls.

(a) WhaL wlll happen lf, lnsLead of Lhe target genome belng Lhe number sequence wlLh Lhe
hlghesL value, we aLLempL Lo obLaln Lhe sequence wlLh Lhe lowesL value?

resenL Lhe resulLs of your experlmenL, ln Lhe form of one or more graphs, and a wrlLLen
summary of your flndlngs (2-3 paragraphs). noLe LhaL Lhls modlflcaLlon can be applled Lo boLh
Sa|tat|on and Cumu|at|ve Se|ect|on algorlLhms.

(10 marks)

(b) AL presenL, Lhe Cumu|at|ve Se|ect|on program has only a slngle parent genome Lo slre each
generaLlon. lL ls common for a geneLlc algorlLhm Lo keep a seL of Lhe besL ch||d genomes over
8ev uaLe: 10-02-14
3
Lhe precedlng generaLlon (l.e., parent genomes for Lhe nexL generaLlon), raLher Lhan [usL Lhe
slngle besL one. 1hls can be declded by selecLlng Lhe N besL chlldren, someLlmes expressed as
a percenLage of Lhe LoLal number of chlldren produced per generaLlon (e.g., keeplng Lhe Lop
10 of ch||d genomes where Lhere are 100 ch||dren per generat|on wlll enLall sLorlng 10 ch||d
genomes). 1hls seLLlng should be conLrolled Lhough a furLher command-llne argumenL.

1hls ls called a breed|ng poo|. Modlfy Lhe program Lo keep a seL of Lhe besL ch||d genomes Lo
become new parent genomes ln Lhe nexL generaLlon, when lL comes Lo creaLlng new ch||d
genomes for he nexL generaLlon, selecL one parent genome from Lhe breed|ng poo| aL random
(called a stochast|c approach).

WhaL effecL does Lhe use of a breed|ng poo| have on Lhe number of generat|ons needed Lo
evolve Lhe target f|tness? resenL Lhe resulLs of your experlmenL, ln Lhe form of one or more
graphs, and a wrlLLen summary of your flndlngs (2-3 paragraphs).

(1S marks)

(c) Cnce we have a breedlng pool ln place, we can also use crossover Lo add dlverslLy Lo Lhe ch||d
genomes we creaLe, raLher Lhan [usL mutat|on. 1o creaLe a new ch||d genome uslng Lhe
crossover Lechnlque, we selecL a number of parent genomes from Lhe breed|ng poo|, and a
number of crossover po|nts.

lor example, lf we declde Lo use Lwo parent genomes Lo creaLe one ch||d genome (raLher
slmllar Lo Lhe naLural world), and have one randomly calculaLed crossover po|nt from 1 Lo Lhe
genome lengLh (noL so naLural!), we may have Lhls slLuaLlon (whereln a and b represenL genes
from parenL 1 and parenL 2):

Parent 1 = [a a a a a a a a]
Parent 2 = [b b b b b b b b]
Child = [a a a a b b b b]

whereln Lhe crossover polnL ln Lhls case was (randomly) ln Lhe cenLre of Lhe genome (gene
four).

lf lnsLead we wanLed Lo use Lhree parent genomes Lo creaLe one ch||d genome, and had four
crossover polnLs, we mlghL have Lhe followlng slLuaLlon:

Parent 1 = [a a a a a a a a]
Parent 2 = [b b b b b b b b]
Parent 3 = [c c c c c c c c]
Child = [a a b c c c a a]

Whereln, ln Lhls case, slnce Lhere were an lnsufflclenL number of parent genomes Lo furnlsh
each crossed-over secLlon of Lhe ch||d genome, we cycled back Lo Lhe flrsL parenL agaln.

WhaL effecL does addlng crossover capablllLy have on Lhe performance of Lhe algorlLhm?
resenL Lhe resulLs of your experlmenL, ln Lhe form of one or more graphs, and a wrlLLen
summary of your flndlngs (2-3 paragraphs).

(1S marks)

MOD002641 Assignment 2013-14

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MOD002641 Assignment 2013-14

Uploaded by

Copyright:

Available Formats

8ev uaLe: 10-02-14

You might also like