Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 24

Gene Expression Programming

CS678 - Machine Learning


Grand Valley State University
by James Bund
GEP is like GA and GP

GA: Linear strings of fixed length

GP: Nonlinear entities of different sizes and shapes

GEP: Linear strings of fixed length and


Nonlinear entities of different sizes and shapes
GEP Algorithm
Genotype and Phenotype
Attempts to represent nature more faithfully.
In GA and GP genotype and phenotype are the same.

Genotype: Genetic representation


Phenotype: Physical representation

Genotype -> Translation -> Phenotype


String chromosome -> Expression Tree

Fitness is a judgment of the phenotype


Reproduction involves the genotype
Encoding Genotype & Phenotype
( a b ) (c d ) 01234567
Q*+-abcd

Genotype is a fixed length string.


Resulting genotype is guaranteed to translate into a valid
expression tree.
Open Reading Frames (ORF)
Biology: Start codons and Stop codons

GEP: Start at the beginning of the coding sequence


Stops when the functions have been expressed,
so usually there are non-coding regions at the end.

0123456789
Q*+-abcdef
Head and Tail Regions
t = h (n-1) + 1

t = tail length
h = head length
n = maximum number of arguments to a function

012345678901234567890
+Q-/b*aaQbaabaabbaaab
Multigenic Chromosomes
Hierarchical discovery technique

012345678012345678
Q*Q+bbaaa*-babaabb
Multigenic Chromosomes
012345678901201234567890120123456789012
IIAIca3aa2acuNNAOab2u3c31cAu12ua3112cac

13 = h + t
t = h (n-1) + 1

h = 4, t = 9
Replication

Replication is copying without modification.


Mutation
012345678012345678012345678
-+-+abaaa/bb/ababb*Q*+aaaba

012345678012345678012345678
Q+-+abaaa/bbQababb*b*+aaaba

Resulting ET will be valid as long as the tail doesnt contain functions.


Insertion Sequence (IS)
Transposition
012345678901234567890012345678901234567890
*-+*a-+a*bbabbaabababQ**+abQbb*aabbaaaabba

012345678901234567890012345678901234567890
*-+*a-bba+babbaabababQ**+abQbb*aabbaaaabba

The position and the length of the IS must be determined.


Resulting ET will be valid, because the change is upstream.
Root Insertion Sequence (RIS)
Transposition
012345678901234567890012345678901234567890
-ba*+-+-Q/abababbbaaaQ*b/+bbabbaaaaaaaabbb

012345678901234567890012345678901234567890
-ba*+-+-Q/abababbbaaa+bbQ*b/+bbaaaaaaaabbb

The position and the length of the RIS must be determined.


Resulting ET will be valid, because the change is upstream.
Gene Transposition

012345678012345678012345678
*a-*abbab-QQ/aaabbQ+abababb

012345678012345678012345678
-QQ/aaabb*a-*abbabQ+abababb

No effect on genes, but can change the linking of genes.


1-Point Recombination
012345678012345678
-b+Qbbabb/aQbbbaab
/-a/ababb-ba-abaaa

012345678012345678
-b+/ababb-ba-abaaa
/-aQbbabb/aQbbbaab

Second gene was not modified.


Both offspring will always be valid.
2-Point Recombination
0123456789001234567890
+*a*bbcccac*baQ*acabab
*cbb+cccbcc++**bacbaab

0123456789001234567890
+*a*bbccbcc++*Q*acabab
*cbb+ccccac*ba*bacbaab

Both offspring will always be valid.


Gene Recombination
012345678012345678012345678
/aa-abaaa/a*bbaaab/Q*+aaaab
/-*/abbabQ+aQbabaa-Q/Qbaaba

012345678012345678012345678
/aa-abaaaQ+aQbabaa/Q*+aaaab
/-*/abbab/a*bbaaab-Q/Qbaaba
Block Stacking (SpellBot) Example
EQ=E TOS=t
DO=D TCB=c
MTT=T NBN=n
MTS=S
NOT=N

EQ(DO(MTT(TOS), NOT(TOS)), DO(MTS(NBN), NOT(NBN)))

012345601234560123456
DTNttttDSNnnnncNStcnc
Advantages of GEP over GA and GP
GAs have less complexity in what they can express.

GPs are harder to work with because their operators must


be smart enough to ensure syntactically correct offspring.

GEP combine these two features by having the ease of


working with them as GAs and the complexity of GPs.
Generated phenotypes are always syntactically correct and
dont need to be corrected.
The genetic operators dont need to handle complex
language syntax.
Multigenic Chromosome Benefits
Evolving components of a solution not just solutions.

Separation of tasks and abstraction of details like in the


engineering done by humans.

Similar to:
Structured Programming
Functions
Object Oriented
Design Patterns
You Can Have Extra Genes
Best Fitness vrs. Average Fitness
References
Gene Expression Programming: A New Adaptive Algorithm for Solving Problems
by Candida Ferreira
http://www.gene-expression-programming.com/webpapers/gep.pdf

Gene Expression Programming


http://www.gene-expression-programming.com

GeneXproTools
http://www.gepsoft.com

You might also like